Receptor models are based on measured mass concentrations and the use of appropriate
mass balances. For example, assume that the total concentration of particulate iron
measured at a site can be considered to be the sum of contributions from a number of
independent sources
where Fetotal is the measured iron concentration, Fesoil and Fe auto are the concentrations
contributed by soil emissions and automobiles, and so on. Let us start from a rather simple
scenario illustrating the major concepts used in receptor modeling.
Atmospheric Chemistry and Physics: From Air Pollution to Climate Change, Second Edition, by John H. Seinfeld
and Spyros N. Pandis. Copyright © 2006 John Wiley & Sons, Inc.
1136
RECEPTOR MODELING METHODS 1137
Source Apportionment Assume that for a rural site the measured PM10 concentra
tion is 32µgm - 3 containing 2.58µgm - 3 Si and 3.84µgm - 3 Fe. The two major
sources contributing to the location's particulate concentration are a coal-fired power
plant and soil-related dust. Analysis of the emissions of these sources indicates that the
soil contains 200 mg(Si) g - 1 (20% of the total emissions) and 32 mg(Fe) g - 1 (3.2% of
the total emissions), while the particles emitted by the power plant contain
10mg(Si) g - 1 (1%) and 150 mg (Fe) g - 1 (15%). Neglecting Si and Fe contributions
from other sources
S1total = S1soil + S 1 p o w e r (26.2)
Fetotal = Fesoil + F e p o w e r (26.3)
If S and P are the total aerosol contributions (in µg m3) from dust and the power plant
to the PM10 concentration in the receptor, then
PM10 = S + P + E (26.4)
where E is the contribution from any additional sources. If the composition of the
particles does not change during their transport from the sources to the receptor,
then, using the initial composition of the emissions, we obtain
Sisoil = 0.25
Fesoil = 0.0325
(26.5)
Sipower = 0.01 P
Fepower = 0.15 P
The preceding is an algebraic system of two equations with two unknowns, the
contributions of the two sources, S and P, to the receptor aerosol concentration. The
solution of the system using the measured Si and Fe concentrations is
S = 12 µg m - 3 and P = 18 µg m - 3 . Using (26.4), we also find that E = 2 µg m - 3 ,
and therefore the power plant is contributing 56.2%, the dust 37.5%, and the
unknown sources 6.3% to the PM10 of the specific location. Recall that we have
implicitly assumed that the unknown sources contribute negligible Si or Fe to the
levels measured at the location.
This example describes a simple scenario but demonstrates the utility of receptor
modeling. One can calculate the contribution of several sources to the atmospheric con
centrations at a given location with knowledge of only source and receptor compositions.
No information regarding meteorology, topography, location, and magnitude of sources is
necessary.
A general mathematical framework can be developed for solution of problems similar
to the example above. Suppose that for a given area there are m sources and n species.
1138 STATISTICAL MODELS
Ifflyis the fraction of chemical species i in the particulate emissions from source j , then
the composition of sources can be described by a matrix A. For the conditions of the
previous example
Let ci be the concentration (in µg m - 3 ) of element i(i = 1,2,..., n) at a specific site, and
let fij be a fraction representing any modification to the source composition aij due to
atmospheric processes (e.g., gravitational settling) that occurs between the source and the
receptor points. Thenfijaijwill be the fraction of species i in the particulate concentrations
from source j at the receptor. If sj is the total contribution (in µg m - 3 ) of the particles from
source j to the particulate concentration at the receptor site, we can express the
concentration of element i at the site as
(26.7)
Usually fij is assumed equal to unity, thus assuming that the source signature aij is not
modified by processes (reactions, removal, etc.) occurring during atmospheric transport
between source and receptor. In this case we simply have
(26.9)
Thus the concentration of each chemical element at a receptor site becomes a linear
combination of the contributions of each source to the particulate matter at that site. Given
the chemical composition of the ambient sample ci and the source emission signature aij
(26.9) can then be solved to provide the source contributions Sj.
If there are k ambient aerosol samples, then let cik be the concentration of element i in
the sample k. The source contributions will in general be different from sampling period to
sampling period depending on wind direction, emission strength, and so on. Equation
(26.9) can then be written in a more general form for the k samples as
(26.10)
RECEPTOR MODELING METHODS 1139
where sjk is the concentration (in µ g m - 3 ) of material from source j collected in the
sample k. A number of approaches based on (26.10) have been used to develop our
understanding of source-receptor relationships for nonreactive species in an airshed.
These methods include the chemical mass balance (CMB) used for source apportion
ment, the principal-component analysis (PCA) used for source identification, and the
empirical orthogonal function (EOF) method used for identification of the location and
strengths of emission sources. A detailed review of all the variations of these basic
methods is outside the scope of this book. For more information, the reader is referred to
treatments by Watson (1984), Henry et al. (1984), Cooper and Watson (1980), Watson
et al. (1981), Macias and Hopke (1981), Dattner and Hopke (1982), Pace (1986), Watson
et al. (1989), Gordon (1980, 1988), Stevens and Pace (1984), Hopke (1985, 1991), and
Javitz et al. (1988).
Ci = Ci + ei, i = 1, 2, . . . , n (26.11)
It is assumed in CMB that the measurement errors ei are random, uncorrected, and
normally distributed about a mean value of zero. These errors can be characterized
statistically by the standard deviation σi of their normal distributions.
For an initial guess of source contributions sj, the predicted concentrations pi for all
elements are given by
(26.12)
on the element, and to account for these different degrees of uncertainties, 1/σ2i are used
as weighting factors. Summarizing, one needs to minimize
(26.13)
by choosing appropriate values of the contributions sj. Note that by using the weighting
factors 1/σ2i, elements with large uncertainties contribute less to theξ2,function compared
to elements with smaller uncertainties. Combining (26.12) and (26.13), we obtain
(26.14)
where n is the number of species and m is the number of sources. The solution approach is
to minimize the value of ξ2 with respect to each of the m coefficients Sj, yielding a set of m
simultaneous equations with m unknowns (s1, s2, . . . ,sm). This is the common multiple
regression analysis problem. The solution is the vector s of source contributions given by
s = [ATWA]-1ATWc (26.15)
where A is the n x m source matrix with the source compositions aij, W is the n x n
2 T
diagonal matrix with elements of the weighting factors, wii = 1/σ i, A is the m x n
transpose of A, c is the vector with the measurements of the n elements, and s is the vector
with the m source contributions. Note that [ATWA] is an m x m square matrix so it can be
inverted.
The solution of the receptor problem using (26.15) considers uncertainties in the
measurements ci but neglects the inherent uncertainty in the source contributions ay. Let
us denote by σaij the standard deviation of a determination of the fraction aij of element i in
the emissions of source j . The solution can then be calculated by an expression analogous
to (26.15) (Watson 1979; Hopke 1985):
s = [ATVA]-1ATVc (26.16)
(26.17)
The unknown source contributions sj are included in the elements of the V matrix, and
therefore an iterative solution of (26.16) is necessary. The first step is to assume that σaij = 0,
1
The transpose of an n x m matrix A denoted by AT is simply the m x n matrix obtained by interchanging all the
rows and columns.
RECEPTOR MODELING METHODS 1141
solve (26.16) directly, and calculate the first approximation of Sj. Then vii can be calculated
from (26.17) and a second approximation is found. If this approach converges, the solution is
found. This approach using (26.16) and (26.17) is known as the effective variance method.
The major assumptions used by the CMB model are
These assumptions are fairly restrictive and may be difficult to satisfy for most CMB
applications. When they are not satisfied, the CMB predictions may be unrealistic (e.g.,
negative contributions) or may include significant uncertainties.
The application of CMB to an area poses a number of difficulties in addition to the
assumptions of the method. Let us assume that a particulate sample has been collected in the
area of interest and its elemental composition has been determined. The first issue that one
needs to address is which sources should be included in the model. If an emission inventory
exists for the region, it can be used to determine the major sources. The second issue is
which source profiles should be used. Profiles used by studies in other areas may be
applicable to only that specific source. For example, the emission fingerprint of a power
plant in Ohio may not be representative of a power plant in Texas. Local sources of road and
soil dust are usually different from location to location. To complicate things even further,
emission profiles often change with time. For example, motor vehicle emission composition
has changed dramatically in the last 40 years with the introduction of new fuels (unleaded
gasoline), new engines, and control technologies. Uncertainties or errors in the CMB results
can be reduced noticeably by obtaining source profile measurements that correspond to the
period of the ambient measurements (Glover et al. 1991). It is clearly essential for the CMB
application to know the area that is to be modeled (Hopke 1985).
When multiple samples are available CMB should be applied to each sample separately
and then the results can be averaged (Hopke 1985). This approach, even if more time-
consuming, is more accurate than the CMB application to the averaged measurements.
Information is generally lost during averaging of sample composition data and cannot be
recovered later by CMB.
1142
Chemical Paved Road Vegetative Primary Motor
Species Dust Burning Crude Oil Vehicle Limestone
NO-3 0 ± 0.47 0.462 ±0.123 0 ± 0.002 0 ± 0.001 0 ± 0.001
2- 0.547 ±1.17 1.423 ±0.423 20.32 ± 4.24 3.11 ±3.55 3.06 ± 0.3
SO 4
NH + 4 0 ± 0.008 0.0852 ± 0.057 0.0076 ± 0.005 0 ± 0.001 0 ± 0.001
Na + 0.181 ±0.055 0.143 ±0.052 0.762 ± 0.399 0 ± 0.001 0 ± 0.001
EC 2.69 ±1.44 15.89 ±5.80 0 ± 0.072 54.15 ±19.78 0 ± 0.001
OC 19.5 ±4.67 44.60 ± 7.94 0.0894 ± 0.118 49.81 ±24.15 0 ± 0.001
Al 9.34± 1.11 0.0019 ± 0.027 0 ± 0.009 0.077 ±0.051 2.11 ±0.21
Si 23.2 ± 2.62 0 ± 0.015 0.011 ±0.016 0.957 ±1.39 6.5 ±0.65
P 0.304 ± 0.05 0 ± 0.022 0±0.17 0.057 ± 0.02 0 ± 0.001
S 0.520 ±0.17 0.521 ±0.176 5.45 ±0.39 1.037 ±1.182 1.02 ± 0.1
Cl 0.163 ±0.031 1.908 ±0.64 0.024 ± 0.021 0.029 ± 0.02 0.46 ± 0.05
K 1.95 ±0.28 3.993 ±1.24 0.044 ± 0.054 0.008 ± 0.008 0.16 ±0.04
Ca 2.98 ± 0.43 0.0659 ± 0.056 0.062 ± 0.005 0.072 ± 0.079 29.52 ±2.95
Ti 0.499 ± 0.067 0.0009 ± 0.016 0.012 ± 0.002 0.001 ± 0.003 0.08 ± 0.04
V 0.0311 ± 0.008 0.0005 ± 0.007 0.823 ± 0.058 0.001 ± 0.002 0±0.1
Cr 0.0299 ± 0.003 0 ± 0.0016 0.007 ± 0.025 0 ± 0.002 0±0.01
Mn 0.106 ±0.016 0.0007 ± 0.001 0.0056 ± 0.001 0.028 ± 0.024 0.05 ± 0.03
Fe 5.41 ±0.88 0.0006 ± 0.001 0.2134 ±0.022 0.001 ± 0.005 1.04 ± 0.1
Co 0.0059 ± 0.076 0.0001 ± 0.001 0.0185 ±0.002 0 ± 0.001 0 ± 0.001
Ni 0.0111 ±0.001 0.0001 ± 0.001 0.789 ± 0.093 0 ± 0.002 0±0.1
Cu 0.02 ± 0.002 0.0001 ± 0.001 0.0009 ± 0.003 0.005 ± 0.003 0.02 ± 0.01
Zn 0.172 ±0.026 0.0866 ± 0.036 0.260 ± 0.034 0.053 ± 0.028 0.1 ±0.01
Ga 0.0003 ± 0.006 0 ± 0.0021 0.0132 ±0.002 0.002 ± 0.002 0 ± 0.001
As 0.0014 ± 0.042 0.0002 ± 0.002 0.0006 ± 0.001 0.004 ±0.012 0 ± 0.001
Se 0.0001 ±0.002 0.0004 ± 0.001 0.0114 ± 0.002 0 ± 0.002 0 ± 0.001
Br 0.0095 ± 0.001 0.0096 ± 0.002 0.0003 ± 0.0002 0.264 ±0.152 0.03 ± 0.01
Sr 0.0794 ± 0.006 0.0007 ± 0.001 0.0015 ± 0.0003 0 ± 0.003 0 ± 0.001
Y 0.0025 ± 0.004 0.0001 ± 0.001 0.0008 ± 0.0003 0 ± 0.004 0 ± 0.001
Zr 0.0091 ± 0.002 0 ± 0.0019 0.0006 ± 0.0004 0 ± 0.019 0 ± 0.001
Mo 0.0004 ± 0.006 0 ± 0.0033 0.0168 ±0.002 0± 0.012 0± 0.001
Ag 0 ± 0.016 0.0003 ± 0.007 0.0002 ± 0.002 0± 0.016 0± 0.001
Cd 0.0015 ±0.017 0.0007 ± 0.008 0.0006 ± 0.002 0± 0.02 0± 0.001
In 0.0030 ± 0.02 0.0001 ± 0.009 0.0009 ± 0.002 0± 0.026 0± 0.001
Sn 0.0037 ± 0.027 0 ± 0.012 0.0007 ± 0.003 0± 0.031 0± 0.001
Sb 0.0054 ± 0.03 0.0022 ±0.014 0.0006 ± 0.003 0± 0.069 0± 0.001
Ba 0.064 ±0.103 0.0095 ± 0.05 0.0013 ± 0.011 0± 0.129 0± 0.001
La 0.0142 ±0.117 0.0016 ±0.056 0.0041 ±0.013 0± 0.236 0± 0.001
Hg 0.0015 ± 0.008 0 ± 0.0037 0 ± 0.0009 0± 0.002 0± 0.001
Pb 0.265 ± 0.032 0.004 ± 0.003 0 ± 0.0013 0.373 ± 0.207 0.27 ± 0.03
1143
1144 STATISTICAL MODELS
cannot be used for source apportionment in this specific case. Results of the source
apportionment using the CMB method (using CMB for each sample and then
averaging the results) are shown in Table 26.3. The major contributors to the
annual average PM10 concentrations that exceeded 50µgm - 3 were primary geologic
material and ammonium nitrate. For the PM2.5, secondary NH4NO3 and (NH4)2SO4,
together with primary motor vehicle emissions and vegetative burning, were the major
contributors.
CMB Evaluation A method often used to evaluate the CMB method is use of only
selected measurement elements for estimation of source contributions and then use of the
remainder of the measurement elements and predictions as a test of the analysis. For
example, Kowalczyk et al. (1982) used the CMB and nine elements (Na, V, Pb, Zn, Ca, Al,
Fe, Mn, As) to calculate contributions of seven sources to the Washington, DC, aerosol.
Each of the selected elements was characteristic of a source: Na for seasalt, V for fuel oil,
Pb for motor vehicles, Zn for refuse incineration, Ca for limestone, Al and Fe for coal and
soil, and As for coal. The authors used 130 samples from a network of 10 stations. Cr, Ni,
Cu, and Se were significantly underestimated, but the concentrations of the remaining
elements were successfully reproduced by CMB. Kowalczyk et al. (1982) repeated the
RECEPTOR MODELING METHODS 1145
exercise using 9-30 marker elements and found little difference in the results as the key
elements (Pb, Na, and V) were included. They also observed that including some elements
(namely, Br and Ba) as markers gave erroneous results for several other elements.
The absolute accuracy of the CMB cannot be tested easily, because the true results are
unknown. However, artificial data sets can be created by assuming a realistic distribution of
sources, source strengths, and meteorology, simulating the scenario with a deterministic
transport model (see Chapter 25), and using CMB to apportion the source contributions to the
modeled concentrations. Gerlach et al. (1983) reported the results of such a test using a
typical city plan and 13 known sources. The results of the CMB application indicated that the
contributions of nine of the sources were accurately predicted (errors less than 20%) while
errors as much as a factor of 4 were found for the remaining four sources. The contributions
of the six most important sources were accurately predicted by CMB and the errors were
associated with sources of secondary importance.
CMB Resolution A final issue that may complicate the application of the CMB on
ambient data sets is existence of two sources with similar fingerprints or, more generally, a
source whose profile is a linear combination of other source profiles. This is called the
collinearity problem. If this is the case then the matrix [ATWA] used in (26.15) has two
columns that are almost similar, or a linear combination of several others. This matrix from a
mathematical point of view is close to singular and the result of its inversion is extremely
sensitive to small errors. Often, if this is the case, the results of CMB are large positive and
negative source contributions. The simplest solution to this problem is identification of the
"offending" sources and elimination of one of them. Physically, because the sources are too
similar, it is difficult for CMB to quantify the contribution of each. Thus there are limits to
how far source contributions can be resolved even with almost perfect information; only
significantly different sources can be treated by CMB. Similar sources have to be combined
into a lumped source. Henry (1983) and Hopke (1985) have proposed algorithms that can be
used for the a priori identification of estimable sources and the estimable source
combinations that can be determined for a given source matrix.
We should note, once more, that during the derivation of (26.15) and (26.16) we have
assumed that the atmospheric transformation termsfy are equal to unity [see also (26.17)].
1146 STATISTICAL MODELS
Therefore these equations should not be applied to species that are produced or consumed
(e.g., sulfate) during transport from source to receptor. Gravitational settling is often
assumed not to modify aij (the elemental fractions of the source emissions) to a first
approximation, even if it changes the net concentrations of these elements. This
assumption is equivalent to assuming that all elements have the same size distribution.
Application of (26.15) to gaseous pollutants that react in the atmosphere is generally not
appropriate.
FIGURE 26.1 Measured aerosol composition of three elements in seven samples for a site
influenced by automobiles (emitting Pb and Br) and a coal-fired power plant (emitting Al).
RECEPTOR MODELING METHODS 1147
FIGURE 26.2 Plane passing through the data points of Figure 26.1. The z axis is defined by
Al = 0 and Br =1/3Pb.
a two-dimensional data set and the two axes correspond to the composition of the sources.
The Al and Br =1/3Pb axes are the principal factors influencing the aerosol concentration
at the receptor.
If there are more than three aerosol species, then we need to work with higher
than three-dimensional spaces, and locating hyperplanes passing through (or close
to) all the data points becomes a complicated exercise. The first step in the procedure
is, of course, the collection of the data set, say, k samples of n aerosol species. These
species measurements are then analyzed for the calculation of the correlation
coefficients. If Al and Pb were two of the species measured, one would have
available the values of (cAI,1, cA1,2, . . . , cA1,k), and (c Pb,1 , cPb,2 , . ..,cPb,k),where cAl,i
and cpb,, are the Al and Pb concentrations in the ith sample. The mean Al and Pb
values will be
(26.18)
FIGURE 26.3 Two-dimensional depiction of the data shown in Figures 26.1 and 26.2.
1148 STATISTICAL MODELS
The correlation coefficient around the mean between lead and aluminum rPb,Al is then
defined as
(26.19)
and is a measure of the interrelationship between Pb and Al concentrations. IfσALand σPb are
the standard deviations of the corresponding samples, the correlation coefficient is given by
(26.20)
If the two are completely unrelated, rPb,Al = 0. If the two variables are strongly related to
each other (positively or negatively), they have a high correlation coefficient (positive or
negative). It is important to stress at this point that high correlation coefficients do not
necessarily imply a cause-and-effect relationship. Variables can be related to each other
indirectly through a common cause. For example, let us assume that for a given receptor
Pb and Al concentrations are highly correlated; high Al values are always accompanied by
high Pb values and vice versa. One would be tempted to conclude that they have a common
source. However, the same correlation can also be a result of the fact that the lead source (a
major traffic artery) is next to the Al source (a coal-fired power plant), and depending on
the wind direction, their concentrations in the receptor vary proportionally to each other.
Correlation coefficients can be calculated for each pair of elements and a correlation
matrix (n x n) can be constructed. Note that because
the matrix will be symmetric, and the elements in the diagonal will be equal to unity. A
correlation matrix for nine elements measured at Whiteface Mountain, New York, is
shown in Table 26.4. The correlation matrix C is the basis of PCA. Let λ1, λ2 , . . ., λn be its
TABLE 26.4 Correlation Matrix for Elements Measured in Whiteface Mountain, New York
Na K Sc Mn Fe Zn As Br Sb
Na 1 0.48 0.03 0.22 0.21 0.14 0.13 0.04 0.08
K 0.48 1 0.46 0.57 0.61 0.03 0.30 0.28 0.15
Sc 0.03 0.46 1 0.82 0.72 -0.26 0.64 -0.06 0.07
Mn 0.22 0.57 0.82 1 0.88 -0.08 0.65 -0.03 0.13
Fe 0.21 0.61 0.72 0.88 1 -0.03 0.46 0.08 0.09
Zn 0.14 0.03 -0.26 -0.08 -0.03 1 -0.12 0.04 0.62
As 0.13 0.30 0.64 0.65 0.46 -0.12 1 -0.18 0.07
Br 0.04 0.28 -0.06 0.03 0.08 0.04 -0.18 1 0.27
Sb 0.08 0.15 0.07 0.13 0.09 0.62 0.07 0.27 1
Source: Parekh and Husain (1981).
RECEPTOR MODELING METHODS 1149
Examples of PCA application can be found in Henry and Hidy (1979, 1981), Wolff and
Korsog (1985), Cheng et al. (1988), Henry and Kim (1989), Koutrakis and Spengler
(1987), and Zeng and Hopke (1989). PCA provides a rather qualitative description of
source fingerprints, which can be used later as input to a CMB model or a similar source
apportionment tool. For more information about other factor analysis approaches the
reader is referred to Hopke (1985).
(26.22)
RECEPTOR MODELING METHODS 1151
where i(x, y) are N orthogonal functions and ai(t) are time weighting functions. The
functions i(x, y) include the information about the sources of the species while the time
weighting functions represent its atmospheric transport. Neglecting vertical concentration
variations and dispersion, the atmospheric diffusion equation for this species is
(26.23)
where u(x,y, t) and v(x, y, t) are the wind components and Q(x, y, t) is the net source term
for the species including emissions and removal processes. Using (26.22)
(26.24)
and
(26.25)
Substituting (26.24) and (26.25) into (26.23) and integrating from time zero to the end of
the sampling period T, one gets
(26.26)
(26.27)
(26.28)
Equation (26.26) suggests that the spatial distribution of the source strength of the species
can be found using concentration and wind data if the EOFs i(x, y) and the time
weighting functions ai(t) can be calculated (Henry et al. 1991).
The EOFs can be found with the following procedure. Assume that a given species is
measured simultaneously at s sites during n sampling periods. The measurements can be
used to construct the n x s concentration matrix C. The first column of C contains all the
measurements at the first site, the second column the measurements at the second site, and
so on. Equation (26.22) can be viewed as the continuous form of the singular value
decomposition of matrix C
C = UBVT (26.29)
singular value is zero or very small, the corresponding eigenvectors can be removed from
the matrices U and V, leaving us with N significant eigenvectors. This step is similar to
selection of the principal components during PCA. Let U* be the n x N matrix with the
significant eigenvectors, B * the N x N diagonal matrix with the remaining singular values,
and V* the corresponding s x N matrix. Then let
τ = B*V*T (26.30)
The columns of τ are then the discrete EOFs and the columns of U* are the discrete-time
functions. These discrete EOFs can be interpolated in space to obtain the continuous EOFs
using, for example, 1 /r 2 interpolation. Then (26.26) can be applied to obtain the spatial
distribution of the source strength.
Note that while PCA is applied to many samples from the same site taken over a
number of sampling periods, the EOF operates on many samples from many sites taken
over the same period.
Assumptions implicit in the use of the EOF are
The last two assumptions are rarely met in practice. The spatial resolution of the EOF is
limited by the number of observation sites and the distance between them. Sudden changes
of windspeed and direction during a sampling period often result in problems.
Applications of the EOF have been presented by Gebhart et al. (1990), Ashbaugh et al.
(1984), Wolff et al. (1985), and Henry et al. (1991). Henry et al. (1991) compared
simulated two-dimensional data generated by a simple dispersion model and the above-
described version of the EOF using simple wind fields. One of the comparisons is shown in
Table 26.7. For this comparison a sampling site was located in each square and the model
was able to reproduce the location of the two sources. However, the source strength is
underpredicted as a result of numerical diffusion to the neighboring cells.
Air pollutant concentrations are inherently random variables because of their dependence
on the fluctuations of meteorological and emission variables. We already have seen from
Chapter 18 that the concentration predicted by atmospheric diffusion theories is the mean
concentration (c). There are important instances in analyzing air pollution where the
ability simply to predict the theoretical mean concentration (c) is not enough. Perhaps the
most important situation in this regard is in ascertaining compliance with ambient air
quality standards. Air quality standards are frequently stated in terms of the number of
times per year that a particular concentration level can be exceeded. In order to estimate
whether such an exceedance will occur, or how many times it will occur, it is necessary to
consider statistical properties of the concentration. One object of this chapter is to develop
the tools needed to analyze the statistical character of air quality data.
Hourly average concentrations are the most common way in which urban air pollutant
data are reported. These hourly average concentrations may be obtained from an
instrument that actually requires a 1-h sample in order to produce a data point or by
averaging data taken by an instrument having a sampling time shorter than 1 h. If we deal
with 1 h average concentrations, those concentrations would be denoted by cx(ti), where
τ = 1 h. For convenience we will omit the subscript x henceforth; however, it should
be kept in mind that concentrations are usually based on a fixed averaging time. There
are 8760 h in a year, so that if we are interested in the statistical distribution of the 1 h
average concentrations measured at a particular location in a region, we will deal with a
sample of 8760 values of the random variable c.
The random variable is characterized by a probability density function p(c), such that
p(c) dc is the probability that the concentration c of a particular species at a particular location
will lie between c and c + dc. Our first task will be to identify probability density functions
(pdf's) that are appropriate for representing air pollutant concentrations. Once we have
determined a form forp(c), we can proceed to calculate the desired statistical properties of c.
If we plot the frequency of occurrence of a concentration versus concentration, we
would expect to obtain a histogram like that sketched in Figure 26.4a. As the number of
data points increases, the histogram should tend to a smooth curve such as that in
Figure 26.4b. Note that very low and very high concentrations occur only rarely. We recall
that aerosol size distributions exhibited a similar overall behavior; there are no particles of
FIGURE 26.4 Hypothetical distributions of atmospheric concentrations: (a) histogram and (b)
continuous distribution.