Вы находитесь на странице: 1из 11

Physical Chemistry II

1
Errors and Data Treatment
Physical Chemistry II Laboratory

Floralba López González , Solmar Varela

School of Chemical Sciences and Engineering. YachayTech. Ecuador

All the measurement are subject to uncertainty, which is manifested in the errors
associated with the values measured for the different parameters or quantities in-
volved in an experiment. These errors can be systematic and/or random errors:
• Random Errors: are the product of random fluctuations of the conditions
under which the experiment is carried out. They are manifested when one
measures several times the same parameter under the same experimental con-
ditions, and different values are obtained. Assuming that the dispersion in the
measured values is random, these errors can be treated with statistical methods
to obtain a value representative of the magnitude measured from the data set.
• Systematic Errors: are associated with the conditions under which the ex-
periment is performed. They do not have statistical fluctuation and their treat-
ment and correction requires a careful review of the experimental setup used.
Common sources of such errors are the use of incorrectly calibrated measuring
instruments, the mistaken belief of experimental conditions such as atmospheric
pressure or temperature, among others.
Most of the methods considered for data evaluation are based on statistical con-
cepts, which are effective for the planning of experiments. These methods allow
to establish optimal conditions for the experiment, such as the minimum number
of measurements, and/or the concise presentation (minimal but significative) of the
experimental data. It is important to keep in mind that statistics should not be ex-
pected to reduce the need for good measurements, taking into account that statistical
methods are most powerful and effective when applied to valid data.

Physical Chemistry II. YachayTech September 1, 2017


1. Statistical Treatment

1.1. Uncertainty and Standard Deviation

In every statistical analysis, one must distinguish between the concepts of precision
and accuracy. For a set of N measurements of an experimental variable y, the mean
is defined as:
N
1 X
ȳ = yi , (1)
N i=1
where yi is the result of the ith measurement.
Precision refers to the degree of reproducibility of the measured quantity, i.e. the
proximity of the results when the same quantity is measured several times. When the
number of measurements is infinitely large, that is N → ∞, in general, the precision
of the measurements is given by the variance σ 2 of the normal distribution associated
with the measurements. This deviation is defined by:
N
1 X
σ2 ≡ (yi − y)2 . (2)
N i=1

The square root of the variance is often referred to as the standard deviation (σ),
that is:
vh i
u P
N 2
u
t (y
i=1 i − y) 
If σ is small → high precision
σ= (3)
N If σ is large → low precision .

This parameter is used extensively to indicate the precision associated with a very
large number of individual measurements. When the number of measurements is
large but finite, the precision is given by the estimated standard deviation of the
mean, σm , of the N values, defined as:
" N
#1/2
1 X σ
σm = p (yi − y)2 =√ . (4)
N (N − 1) i=1
N

Accuracy, on the other hand, refers to the proximity of a measurement to an accep-


table value, or “real” value. Measurements of high precision are not always accurate.
A test that serves to differentiate between the terms precision and accuracy, is the
throwing of darts, with the intention of shooting at the center of the target, which is
represented in the figure 1. If the group of darts is distributed throughout the target
very far apart, the experiment is considered to be of low precision and accuracy (case
A). If the darts fall very close together but far from the center, the experiment can

3
be considered precise but not very accurate (case B). If the darts are scattered far
apart around the center, the experiment is considered to have high accuracy and low
precision (case C). And finally, the case in which the darts hit the center and are
very close to each other, the experiment is said to have high accuracy and precision
(case D).

Figure 1: Difference between accuracy and precision.

1.2. Student t Distribution


When you have a small set of data, N ≤ 20, you need a small sample statistic, and
you should use the Student t distribution, instead of using the normal distribution,
that applies to the case of large samples. In the small sample case, the uncertainty
is expressed in terms of the estimated standard deviation of the mean (σm ), as:

δP = tP σm , (5)

where the value of tP depends on the number of measurements made (ν = N − 1)


and the confidence percentage P . The value of P represents the percentage of all
errors that are less than the mean standard deviation in magnitude. The values of
tP are listed in the table 1 for different values of P and ν.
For example, the uncertainty of a series of eight measures with a confidence percen-
tage of 95% is represented by:

δ95 = t95 σm = 2.36σm . (6)

1.3. Propagation of Errors


When you have a set of random errors, you can assign to each experimental variable
an uncertainty given by σm or δ. For a numeric result designatedas F and the

4
Table 1: Critical values of tP for the Student t distribution

ν P 50 80 90 95 98 99 99.9
1 1.00 3.08 6.31 12.7 31.8 63.7 637.0
2 0.816 1.89 2.92 4.30 6.96 9.92 31.6
3 0.765 1.64 2.35 3.18 4.54 5.84 12.9
4 0.741 1.53 2.13 2.78 3.75 4.60 8.61
5 0.727 1.48 2.02 2.57 3.36 4.03 6.87
6 0.718 1.44 1.94 2.45 3.14 3.71 5.96
7 0.711 1.41 1.89 2.36 3.00 3.50 5.41
8 0.706 1.40 1.86 2.31 2.90 3.36 5.04
9 0.703 1.38 1.83 2.26 2.82 3.25 4.78
10 0.700 1.37 1.81 2.23 2.76 3.17 4.59
15 0.691 1.34 1.75 2.13 2.60 2.95 4.07
20 0.687 1.33 1.72 2.09 2.53 2.85 3.85
30 0.683 1.31 1.70 2.04 2.46 2.75 3.65
∞ 0.674 1.28 1.64 1.96 2.33 2.58 3.29

independent quantities measured directly x, y, z, ..., the uncertainty of F value is


given by:
 2  2  2
2 ∂F 2 ∂F 2 ∂F
[δ(F )] = [δ(x)] + [δ(y)] + [δ(z)]2 + ... (7)
∂x ∂y ∂z

1.4. Q-test

Occasionally, for a set of measurements performed, one value differs considerably


from the rest. In these cases, consideration should be given to whether the measure-
ment should be rejected or taken into account. For this purpose, the Q -test can be
used as a simple statistical approximation. In a series of 3 to 10 measurements, if
some of them seem to deviate from the mean, the amount Q can be determined by:
|(suspect value) − (value closest to it)|
Q≡ . (8)
(highest value) − (lowest value)
The value of Q obtained is compared to the critical value Qc shown in the table 2,
for different numbers of observations in the series. If Q ≥ Qc , the measurement in
question must be rejected. If Q ≤ Qc , this measuremnt must be preserved.

1.5. Fit of Experimental Data

When the experiment consists of the evaluation of a relation between two varia-
bles, the data corresponding to the functions that must be analyzed conveniently

5
Table 2: Critical Q values for rejection of a discordant value at 90% confidence
level

N 3 4 5 6 7 8 9 10
Qc 0.94 0.76 0.64 0.56 0.51 0.47 0.44 0.41

by means of graphs. From the graphs one can obtain information about the mathe-
matical model that represents the phenomenon studied. The objective is to find the
mathematical model that best fits the experimental data, based on the appropriate
statistical criteria.
Graph editors determine the representation of a series of data by means of the best
fit of it to an analytical model function. To achieve this, some statistical criteria
are used to assess the goodness of the selected model. The most common method
to use is the least-squares method, which is a powerful tool by which a function ŷ(x)
represents a set of experimental data yi , measured from a series of values of the
independent variable xi . The analytic form may be an equation associated with a
theoretical model, or it may be the result of curve fitting, such as a polynomial, which
might not correspond necessarily to a theoretical relationship with the experimental
data, but can provide an empirical and useful representation. Regardless of how the
fit is to be made, there will be parameters that must be selected such that the “best
fit” is achieved. Usually, the number of experimental data (N ), this is the size of the
sample, significantly exceeds the number of mentioned adjustable parameters.
The resulting function is not an exact fit at each point, but represents a better overall
fit, and the criterion that the sum of the squares of deviations of the observed value
(for each value of independent variable x) with respect to the fitted model value is a
minimum. From this criterion comes the name of the method:
X
[yi (x) − ŷ(x)]2 → 0 . (9)
i

the deviation within the square parentheses is known as residual:

Residual = yi (x) − ŷ(x) . (10)

The fitting function ŷ(x) obtained by the method of least square can be differentiated
and integrated by analytical methods, so it is not it is necessary to apply other
numerical methods unless the function is such that its analytical manipulation is too
cumbersome to develop in closed form.
One way of quantifying the quality of the fit of the experimental data with the
proposed model is by means of the evaluation of the coefficient of determination,
R2 . This coefficient, defined in the equation 11, is a statistical measure that gives

6
information about the “goodness” or adequacy of the fitted model, since it allows
quantifying how close it is to the experimental data.

(yi − ŷ(x))2
P
2
R = 1 − Pi 2 (11)
i (y i − ȳ(x))

The value of R2 lies between 0 and 1. If R2 = 1 indicates that the model fits perfectly
with the experimental data, and the quality of the fit decreases as R2 moves away
from 1. A value of R2 = 0 indicates that the model does not describe the experimental
data at all.
It is important to keep in mind that a value of R2 close to 1 does not necessarily imply
a good fit. There are other complementary ways to evaluate the goodness of fit, such
as the evaluation of the residual, defined in the equation 10. The analysis of residuals
plays a fundamental role in the evaluation of the model adjusted to the experimental
data. This analysis is not only possible to verify the hypothesis of the predicted
model, but also allows to detect the presence of observations or atypical data, the
existence of an omitted variable, errors in the analysis of the selected model, among
other factors that subtract from the randomness of the residual, condition necessary
to validate the model. If the residuals appear to behave randomly, it suggests that
the model fits the data well. However, if the residuals display a systematic pattern,
it is a clear sign that the model fits the data poorly.

2. Case Study I: Determination of the mean of a series of measurements


and its uncertainty

As a case study, a statistical analysis of a series of data corresponding to experimental


measurements, made with a polarimeter, of the rotation angle of the polarized light
going through the problem solution. This is the optical rotation for the solution α. In
this case, the optical rotation of a crystalline compound dissolved in a known volume
of water will be analyzed at constant temperature. Although the experimental data
correspond to the optical rotation of the crystalline compound (solute), the property
of interest, as reported in the literature at a given wavelength λ and temperature,
t(◦ C) t(◦ C)
corresponds to the specific optical rotation [α]λ . Both quantities, α and [α]λ
are related by the equation:
t(◦ C) V
[α]λ = α, (12)
Lm
where V corresponds to the volume (in cm3 ) of the aqueous solution of the crystalline
compound contained in the polarimeter cell of length L (in dm) and containing a
mass m (in g) of the crystalline compound dissolved in water.

7
2.1. Objectives

• To develop a statistical analysis of a given sample of data, to correctly report


the mean value of the measurements with its associated uncertainty.
• Examine some basic statistical criteria to properly report experimental results
with their respective errors.
• Differentiate between systematic and random errors associated with a data
series.

2.2. Methodology

2.2.1. Description of the experiment


The data given in the table 3 correspond to the rotation angle of the polarized light
of an aqueous solution of a dextrorotatory optically active crystalline compound with
a positive optical rotation, recorded at a temperature of 25◦ C. It is intended that
one evaluates the purity of crystalline compound, in view of the fact that it may
be contaminated with the optical levorotatory isomer, which will cause a decrease
in the magnitude of the positive rotation. In order to evaluate if the sample is
contaminated, the specific rotation of the test sample is determined, and its value
is compared with the value reported in the literature at the temperature of the
experiment. For the◦ case of the crystalline compound analyzed, the value reported
corresponds to [α]25
D
C
(report.) = 152.70◦ . Usually the wavelength corresponds to
λ = 589 nm, the D line of a sodium lamp, unless otherwise specified.
The recording of the values of α was carried out by keeping the polarimeter at
constant temperature, which is achieved using a jacket covering the polarimeter,
through which recirculates water whose the temperature is regulated by a thermostat
bath at (25.0 ± 0.2)◦ C.

Table 3: Experimental data for the optical rotation (α) of the problem crys-
talline compound measured at a temperature of 25◦ C.

α (◦ ) 20.04 20.07 20.05 20.09 20.04 20.02 20.04 20.03 20.06 20.05

The values corresponding to the mass of the crystalline compound (m solute), volume
(V) and cell length (L) of polarimeter are shown in table 4.

2.2.2. Procedure for the treatment of experimental data:


1. From the series of data shown in the table 3, calculate the mean of the optical
rotation (α).

8
Table 4:Experimental data for the rotation (α) of the problem crystalline
compound measured at a temperature of (supposedly) 25◦ C.

m ± ∆m (g) V ± ∆V (cm3 ) L ± ∆L (dm)


1.5220 ± 0.0003 25.00 ± 0.02 2.000 ± 0.002

2. Estimate the standard deviation (σ) and the estimated standard deviation from
the mean (σm ) of the data series.
3. Given the size of the data sample, determine the uncertainty (δ) for a 95%
confidence of the value of ᾱ using the appropriate statistical criterion.
4. Repot in the correct way the result for the mean optical rotation (ᾱ) with its
respective uncertainty.
◦C
5. Determine the specific rotation [α]25
D using the equation 12, and correctly
report the obtained value.

6. Compare the value obtained for [α]25
D
C
with the value reported in the literature
and estimate the error associated with the experimental result.
7. Evaluate the following sources of error, make the appropriate corrections, and
report the new results correctly:
i. The Q-test was not applied correctly, so perhaps some of the data that
were initially rejected had to be considered in the statistical treatment.
ii. The operator wrote down the data erroneously, and the actual mass was
not recorded, but the “5” in the annotated measurement corresponded to
a “6”.
iii. The temperature of the thermostat bath of the water recirculating through
the polarimeter jacket was not at 25◦ C at the time of measurements but
it was at 18◦ C.
iv. The polarimeter was badly calibrated. Taking the measurement of pure
H2 O recorded a value of α = +0.26◦ and not 0.00◦ as expected.
8. Determine the % of purity of the given sample, with respect to the dextorota-
tory isomer.

3. Case Study II: Fit experimental data to a theoretical model

For this point, the case study corresponds to the variation of the viscosity with the
temperature. The viscosity of a liquid is the resistance of the molecules that form

9
it to separate from each other, that is to say, it is the resistance of a fluid to shear,
which is due to the forces of cohesion molecules having one molecule of a liquid
(or fluid) with respect to the other molecules of the same liquid. The effect of the
temperature on viscosity can be represented by the following equation:
η ∝ e−Ea /RT , (13)
where T corresponds to the absolute temperature.

3.1. Objectives
• Apply the fitting procedures to a given experimental data series.
• Determine the fitting parameters of the selected model and give it an appro-
priate physical meaning.

3.2. Methodology
By an appropriate experimental method, the values of viscosity of ethanol were
determined at different temperatures. The experimental data obtained is shown in
the table 5.
Table 5: Experimental data for the viscosity (η) of pure ethanol measured at
several temperatures.

t (◦ C) 0 10 20 30 40 50 60 70
η (cp) 1.78 1.45 1.17 0.98 0.83 0.69 0.60 0.51

3.3. Procedure for the treatment of experimental data:


1. From the given experimental data in the table 5, plot correctly using the ap-
propriate variables to fit the given model in the equation 13.
2. Try a linear fit to analyze the given experimental data and evaluate the good-
ness of this fit,
3. Find the best fit using a graph editor and compare it with the model represented
by the equation 13.
4. Use the equation 13 to determine the fitting parameters and give it an appro-
priate physical meaning.
5. Determine the Residual obtained using the experimental data and the fitted
model.
6. Plot the Residual vs. fitted values and discuss the behavior of this graph.

10
References
[1] Carl W. Garland, Joseph W. Nibler, David P. Shoemaker, Experiments in Phy-
sical Chemistry McGraw-Hill, Higher Education, 8th Edition (2009).

[2] Peter Atkins, Loretta Jones, Chemical Principles. The Quest for Insight W. H.
Freeman and Company, New York, bf 5th Edition (2010).

11