Вы находитесь на странице: 1из 6

THE PREDICTION OF THE NUMBERS OF VIOLATIONS OF

STANDARDS AND THE FREQUENCY OF AIR POLLUTION


EPISODES USING EXTREME VALUE THEORY

P. G. SURMAN, 1. BODERO and R. W, SIMPSON


School of Australian Environmental Studies. Griffith University, Nathan. Brisbane, Queenstand.
Australia 411 I

Abstract-Extreme value theory is discussed in a manner suitable for scientists working in the air pollution
area. The method of application of the theory is described by means ofancxamplc analysis on an ozone data
set and the theory is applied to several data sets collected in Brisbane, Queensland, Australia. The theory is
used to predict the number of violations of WHO and U.S. standards expected in the year following data
collection. Comparison of these predictions with the relevant observations shows that the theory does quite
well. Ways of using extreme value theory as an air quality management tool are suggested.

Key word index: Extreme value theory, air pollution episodes, air pollution standards.

I, INTRUDUCTION period of relevant standards. He showed that for data


to ~adequately represented by the th~ry of extremes,
The theory of extreme values deals with the behaviour extraordinary occurrences and trends should be re-
of the largest and smallest observations in a statistical moved. Berger et al. (1982) found that the two-
series and is used for the forecast of extremes. The parameter exponential distribution which is directly
theory has been successful1y applied to many physical related to extreme value theory fitted air pollution data
phenomena, e.g. wind speeds, rainflails,wave heights, above the 95 pc?rcentile well, and thus provides more
discharges of rivers, breaking strength of materials. It accurate estimates of these large values than the often
finds most application in hydrology where it is used for used lognormal distribution of the Larsen model.
the prediction of floods. So far it has been little used in Other work (Surman, unpublished) has shown that the
the air pollution area. assumption of lognormality for air pollutant concen-
Under the assumptions that air pallutant concen- trations is unreasonable and that divergence especially
trations are lognormally distributed and statistically in the higher percentiles increases with averaging time.
independent, S~ngpura~lla (f972) used extreme value As interest is usually direcied at the higher percentiles
theory to interpret certain empirical ~iatjonshi~ of for violations of a standard or prediction of an air
the welt known Larsen Model (Larsen, 1969). Barlow pollution episode it is obvious that the statistical
(1973 undertook similar work but relaxed theassump- distribution chosen to fit thedata should beaccurate in
tion of normality to one of a general group of the upper percentile region.
distributions including the gamma, Weibull and log- The purpose of this work is to demonstrate the
normal. Following on from their earlier works, Barlow usefulness of extreme value theory in the air pollution
and Singpurawalla (1974) showed that if the assump- area for predicting violations of air quality standards
tion of independence is replaced by a more realistic one and the frequency of air pollution episodes. The theory
of association (i.e. the observations constitute a is explained briefly and then applied to several data sets
stationary autoregressive process with positive auto- from the Qu~nsland State Government’s Division of
correhtions and normally distributed random shocks), Air Pollution Control ~ntinuous monitoring network
then the extreme value approximation provides a in Brisbane. The data sets chosen are those which
lower bound on the probabifity that any specified air represent primary and secondary pollutants resulting
pollution standard will be violated. from motor vehicle emissions. It is proposed that
Horowitz and Barakat (1979) also interpreted examination of such data using extreme value theory
Larsen’s model using extreme value theory. In ad- provides a tool for management and forward planning
dition, they demonstrated that even if the air pollutant of the air quality of an urban area.
concentrations result from a non-stationary autocor-
related process the asymptotie approximation given by
extreme value theory fits the data well. 2, EXTREME VALUE THEORY
Roberts (1979a. b) provides a comprehensive review
of extreme value theory. We applied the theory to Roberts (1979a. b) gives a comprehensive review of
various air pollutant data sets to predict the return the principles and underlying assumptions of extreme
value statistics as applied to the air pollution area. His asymptotic variates, J‘. denoted y(r)+ r = I, N using
works are based on Gumbel (1954. 1958) and readers
are advised to consult these two authors for details of h (r) = - In (- In P,,v). (6)
the theory as only a brief discussion of its essential (iii) Parameter est~m~rion
components is presented here.
At this point there is a one to one correspondence
between the ex~rimental data x,,,(r) and the asymp-
totic variates v,(r) and there remains the task of fitting
The !arge and smah values assumed by a random
the straight line between x, and yIII.It is important to
variable from a finite number of m~urements are
note that mall differences in the estjmatjon of the
termed ‘extreme values’ and are themselves random
parameters may have a large influence on the return
variables. Consider a set of n m~surem~nts of the
period of a given value and hence are critical for
random variable X resulting in the ordered sequence
~~annjng purposes.
x1 > x2 YP. ‘ . > x,. If the experiment is repeated N
Gumbef (~9~)opt~mi~es the vertical and horizontal
times the result will be N such ordered sequences
departures from the line. The required parameter
which may be rearranged into n ordered sequences,
estimates are
each consisting of N v&es of X, where m = 1. n and
X, is the mth largest value of X from a sample of size n.
For each vafue of m the observed x, are arranged in
descending order and the resufting sequence is denoted
by x,(r), r = f. N, where r denotes the rank of a
particular x, within the sequence of X, values.

(ii) Asymptotic distr~~urj~n


If the initial djstrjbution of X, E (x), is known, but IS
assumed to be of the ‘exponential type’ (for example
normal, iognormat, exponential, gamma, Weibull or The observations and the theoretical relationship
beta), then Gumbel’s Type I asymptote can be used to may be plotted on suitable probability paper to show
approximate the distribution of the upper extremes the goodness of fit and to indicate if any observations
{Gumbel, 1958). The djstribution of the mth largest may be extraordinary o~ur~n~es or discordant with
vafue, x,,,, from a sample of size n is denoted as G,,,(x). the rest of the data set. The required paper as shown in
An approximation for this distribution is given by the Fig. I has the asyrnpto~j~ variable si. fr) at the desired
linear scale an the lower abscissa; selected ~o~b~Ijtjes
asymptotic expression:
(PCS)on the upper abscissa; and the observed v&able
L;l(vntl = expE-exp(--YJJ (II x,(r) at the desired Iinear scale on the ordinate.
where
.&(x1 = amf.*nr- u-1 (Gumbet, 1954). (iv) Return period
(2)
Of the possible reafizations of X, a specific value, x,
For example. ifone is interested in the maximum value
(say the U.S. standard) may be of particular sign%-
from a sample size of n, i.e. m = I, an approximation
cance. The return period R fx,) for the standard is the
for thedistribution of XI [denoted Gt,,(x)] is given by
number of sets of experiments that would need to be
G,+(f,f= exp[-exp(--Y,)I (3) performed, such that, on average, one realization of X
where would equal or exceed the standard. Alternatively, it is
+Vi(x) = a, (x - U]J. (4) said that X would equal or exceed x,, an average of
once in R{x,) ex~rjrne~ts. Roberts (1979~5)gives the
G, ,,(yl ) is the asymptotic expression for Gr ..@I, i.e.
return period as
tbe pro~bil~ty that the largest observation x1 in a
R(x,) = [I -G,,(x)]-? 181
sample size of n is less than x.
The parameters a, and nnr are dependent on the Note that n determines the units of the return period.
inirial distribution F (x) and are functions af m and n. The return period can also be included on the prob-
They are estimated from the experimental data. ability paper as shown in Fig. 1.
Roberts (1979b) shows that the expected value of
G,.(x) is equivalent to the probability ofa value being (vf Extraordinary occurrences
ranked r out of N observations, i.e. Gumbel(1954, 1958) has given numerical values for
the standard errors to be used in constructing confi-
dence bands about the theoretieaf straight iine ol
Equation (2). The standard error assigned to each
The asymptotic theory is applied by fitting the theor- individua1 value de~nds on the rank of the value.
etical expressions given in (I) and (2) to the exper- Table f provides the 95 7; confidence interval f Ax
imental data. The numerical v$ues of PrN from about 2 -c u^for the four highest ranked observations
Equatjon (5) are substjtuted for &,,,..fy,,,) in Equation ff
(1). which is then inverted to give the N virlues of the of the maximum value.
Frequency of air pollution episodes using extreme value theory I845

Probability

0.01 0. IO 0.50 0.75 0.90 0.95 0.98 0.99 0.998


T, I, “I,‘V,( l”“1’ ’ I I”“” ’ I 8
Return period (months) 2 4 IO 20 SO too !Bo
, 4 I Il”‘l”“‘i ’ ‘f

340 -
320 -
300 -
280 -
260 -
240 -
no-
200-
180 -
160 -
140 -
120 -

100
60- - ycl
6of-
40
I I I I I I I I I I
-2 -I 0 I 2 3 4 5 6 7

Y, : voriote of asymptotic distribution

Fig. 1. Observed and theoretical largest I-h ozone concentration per month, Brisbane-across three sites.

Table 1. 95% confidence intervals about the line (_v,vs x) less stringent and is content with the one value in t
representing G,.(x) future trials being greater than the observed maximum
of the experimental record then
Rank of observed
maximum value, r 1 2 3 4 T=f/[r+JzYJ, (i=2) (10)
AX &3.07fo^ f 1.78/g f 1.3S/a^ i l.I7/& where t = p/(1 - p) (Roberts, 1979b). Hence for a 90%
confidence that only one value in the following year
Adapted from Roberts, 1979b.
will exceed the maximum of the record, 2.2 years of
data should be analysed.
Note that points that fall outside the confidence The formulae of (9) and (IO) are meant as guidelines
bands need not necessarily be erroneous but in order for the experimental design. They simply reinforce the
that the extreme value theory may represent the data concept that the longer the data record the more
they are excluded (Roberts, 1979b). confidence one has in the predictions that are made
from it. This is especially important in the air quality
3. APPLICATION
area where the data record should reflect the variety of
meteorological conditions which are experienced in an
It isobvious that no matter how long thedatarecord
area and hence will include those conditions which lead
analysed, there is a certain probability that longer
to air pollution episodes. It should be noted however
observation will produce a value larger than the
that the longer the data record the more likely that the
existing observed maximum. However compromise
emissions may exhibit a trend either due to more
must be made between sample size, time available for
emission sources or conversely, due to tighter emission
the experiment, available funds and other factors and
control. Most developed countries have instigated
the confidence with which one desires to make predic-
continuous air monitoring programmes and hence
tions from the data. The objective is to establish the
records of good length and quality are becoming
total number of observations T, such that for a given
available.
confidence of (I -p). less than i values in the next z
observations will be greater than the maximum ob- (i) The data
served in the sample of T. Roberts (1979b) shows that In Queensland sucha programme wasestablishcd in
Brisbane at the inner city Fortitude Valley site in 1975;
at the SW suburban site of Rocklea in 1977; and at the
near coastal site of Eagle Farm in 1978. Of the data
This is interpreted as, for example, if a confidence of collected at these sites. the following were analysed:
907; is required that no values greater than the Fortitude Valley, 19761983. NO,. CO, 0,; Rocklca,
observed maximum should be observed in the foliow- 1977-1983, OS, 1978-1983, NO,; Eagle Farm,
ing year, 9 years of data should be analysed. If one is 1978-1983. OS, 1979-1983. N02. In addition the
1846 P. G. SURMAN et al.

highest 03 concentration across these three Brisbane Reports of the Air Quality Council of Queensland.
sites for 1976-1983 was analysed. Note that from Missing values in the X1 data set were replaced by the
Equation (9) we can say that for 8,7.6 years of data we relevant monthly mean as graphing of the data showed
are 89.88.86 %, respectively confident that a maximum that there was a seasonal erect but no annual trend.
greater than that of the data record will not occur in the For ixample, the missing observation for March 1976
following year. Perusal of the 1984 data shows that no for NOz at the Fortitude Valley site was replaced by
daily maximum l-h concentration was greater than the the average of the remaining March observations of
maximum of the respective record up to the end of NO2 at this site.
1983. At the sites half-hourly averages of ambient air The 96 observations are ranked from highest to
pollutant concentrations are recorded continuously. lowest so that a new ordered sequence x1 (r); r = 1. 96
i.e. there are 17,520 observations for each pollutant at results. The probability of a given x, value having rank
each site each year. The l-h averages are computed as r out of a sample of 96 is calculated using Equation (5)
the arithmetic averages of pairs of 1/2-h concentra- and the corresponding asymptotic variate y,(r) is
tions. An assumption of independence of such l-h calculated from Equation (6). If a value is repeated j
observations will not be valid, however if we consider times it is assignedj successive ranks from r to r -tj - I
the initial random variable. X, as the maximum l-h for the purpose of fitting the theoretical line. For the

average in a day, then the assumption of independence purpose of plotting. the points corresponding to the
is strengthened. We will confine our analysis to this first and last ranks of the repeated value are shown
definition of X. connected by a line. The point shown on the interval is
the value of!!, corresponding to the rank Jr(r+j-l)
(ii) Example analysis-ozone, Brisbane (Roberts. 1979b). See for example the value of
If we consider that within any month, i. there are. on 118pgm -’ in Fig. I.
average, 30 observations of the daily maximum 1-h O3 The parameters of the theoretical line are de-
concentration, then these can be ordered from highest termined using Equation (7). It can be seen from
to lowest in the form xi., , > Xi.27 > .x,.30. Re- Fig. 1 that the asymptotic theory of extremes fits the
peating the process for each month of the 8 years of monthly maximum well (coefficient of determination
data available results in 96 such ordered sequences; the = 0.995).
first of these being denoted as x,. , , x!.~, . . x1.30and
the last as .rg6., , xg6.2, x96.30. By selecting the (iii) Rrsulrs
highest value. _Yi.,, from each sequence we derive Using Table 1 confidence intervals were placed
another sequence of 96 observations which represent about the four highest ranked observations of each
realizations of the random variable X, i.e. the maxi- data set. it was found that the largest value for NOI, at
mum l-h concentration in a day. (As most standards of Fortitude Valley of 642 pgme3 did not lie within the
interest are written in terms of maximum l-h concen- confidence interval for the highest ranked value
trations other derived sequences are not analysed (195 pgm-’ < x(l) c 565pgm-3). The plot of these
here.) These final data were obtained from the Annual data is shown in Fig. 2. For the application of the

Probability

0.01 0. IO 0.5O 0.75 0.90 0 95 0 96 0.99 0 996


II, III,1 I/I 11111,I 1 I 11))(11 ’ 1 I
Return period (months)
2 4 IO 20 50 100 500
I 1 , ,,!,I,rn,, 81, I I I,
680 -

.Sj 640 -
s 600 -
560 -
G 2m
=” 8 520
0 8 480 -
zr 440 -
7E L 400 -
,% 360 -
“$ 320-
4:kfi 280 -
240-

% zoo-
4 160 -

I I I I
II A 1 I I I I
-2 -I 0 I 2 3 4 5 6 7

Y, : voriote of asymptotic distribution

Fig. 2. Observed and theoretical largest I-h nitrogen dioxide concentration per month, Fortitute Valley site.
Frequency of air pollution episodes using extreme value theory 1847

extreme value theory this data point was considered the value of 240 fig m -3 for the daiIy maximum I .h O3
discordant with the rest of the data set and was concentration will be equalled or exceeded once in 16
replaced with the relevant monthly mean and the months. This shows that this standard would be
analysis was repeated. expected to be met in the following year. Consider now
The equations for the theoretical lines for each data the World Health Organixation (WHO) long-term
set together with the coefficients of determination goal of a l-h concentration of 120~gm-‘. For the
appear in Table 2. Note the improvement in the Brisbane data set, its return period is 1.31 months, i.e.
coefficient of determination from 0.792 to 0.979 for in the coming year we would expect this goal to be
NOI at Fortitude Valley when the discordant value equalled or exceeded as the daily maximum I-h
was replaced. concentration on 10 occasions. Note that the WHO
The return period for any concentration can be read goal specifies only a l-h concentration and not the
directly from the appropriate graph or calculated as in daily maximum l-h con~ntration and hence the
the following example. Consider the Australian prediction is a lower bound on the number of viol-
National Health and MedicaI Research Council ations expected. Table 3 presents the relevant stan-
(NHMRC) long-term goal for 0,. i.e. the daily maxi- dards, their return periods and the number of viol-
mum I-h concentration must not exceed 240 pgrn-’ ations expected and observed in the coming year for
on more than one occasion per year. For Brisbane. each pollutant and site analysed.
using the appropriate Equation (4) from Table 2,
y1 (240) = 2.71828 and using Equation (3), e1,30 (iv) Air quality management
= 0.936142 which from Equation (8) yields a return The maximum expected for a certain return period
period, R (240) = 15.66 months, i.e. it is expected that can be used in source reduction management problems

Table 2. Equations of the asymptotes and coefficients of determination

CoeJlicient of
Data set Equation of asymptote determination

Fortitude Valley y, (x) = 0.0272 (x - 106.850) 0.988


Rocklea y, (x) = 0.0261 (x - 113.400) 0.988
Eagle Farm y, (x) = 0.0357 (x - 106.772) 0.963
Brisbane (all sites) y, (x) = 0.0256 (x - 134.033) 0.995
Fortitude Valley y,(x) = 0.0166 (x- 104.024) 0.792
(incl. extraordinary
occurrence)
NO1 Fortitude Valley y, (x) = 0.0234 (x - 109.421) 0.979
(excl. extraordinary
occurrence)
Rocklea y, Ix) = 0.0361 (x - 77.630) 0.992
Eagle Farm yt ix) = 0.0228 (x - 103.218) 0.995
Fortitude Valley y, ix) = 0.1147 (x - 27.529) 0.928

Table 3. Air quality goals/standards. return periods and expected and observed number of violations

No. of violations in 1984


Concentration Return period
Goal/standard @g m-“) Site (Months) Predicted Observed

240 Fortitude Valley 38.0 0 0


Rocklea 27.6 0 0
Eagle Farm 117.0 0 0
Brisbane 15.7 0 0
(all sites)
120 Fortitude Valley 2.0 6 15
Rocktea 1.8 7 16
Eagle Farm 2.2 6 I1
Brisbane 1.3 IO 33
(all sites)
320 Fortitude Valley 220.3 0 0
Rocklea 13067.4 0 0
Eagle Farm 141.6 0 0
CO (U.S. National
Standard)S Fortitude Valley 4.7 3 0

l As defined in text.

t l-h concentration not to be exceeded more than once per month.


$ l-h concentration not to be exceeded more than once per year.
1848 P. G. SURMAN et at.

where at~inment of the standard may not be im- emissions. Obviously these wiff vary through time due
mediately feasible. For example, over 97% of CO to either tighter control or increased sources ol
entering the Brisbane airshed results from motor emissions in a given area. The resulting effect may be a
vehicle emissions. Examination of Table 3 shows that trend to either lower or higher concentrations in the
the maximum J-h concentration of CO at the record which should be removed from the data prior to
Fortitude Valley site will equal or exceed the U.S. the analysis.
standard on 3 days per year (compared with 23 such The model’s main use is in the prediction of the
occurrences in the 8 years prior to 1984). Of course number of violations expected for a standard in a given
there may be several violations of the standard on any period or the return period for any given value. It is
one of those days. The value expected to be equalled or also useful as an air management tool for the presen-
exceeded once per year is 49 mgmv3. Hence, on tation of scenarios to aid decision makers to determine
average, a reduction of9 mg m -3 needs to be achieved needed source reductions and for predicting severe air
at this site. Using the information contained in an pollution episodes.
emission inventory for the airshed this can be trans-
lated into either a required reduction in the number of Acknowledgements-The authors express their appreciation
vehicles passing the site or an ef%dency factor for use to Chris Czarkowski for writing the computer program for
the analysis and to the staff of the Division of’ Air Pollution
with emission control devices for motor vehicles. Most
Control Queensland, Australia for their assistance with the
large cities have emission inventories If an immediate data.
reduction of 9 mgm -3 is not feasible then difl‘erent
strategies for various levels of reduction can be
evaluated by comparing the return periods of the REFERENCES

concentrations expected. A similar approach can be Air Pollutian Council of Queensland, Annual Reporrs
used for NO3 emissions at sites near roadways. 119X-1984) Government Printer, Brisbane, Australia.
Extreme value theory may also be usefur for the Barbw R. I?.. (1972) Averaging time and maxima for au
prediction of {and hence forward planning for) severe pollution concentrations. Proc. #tk ~~~ke~e~ Symp. on
~~fkern~f~c~~ Statistics. University of California,
air pohution episodes (analogous to the 10 or 50 year
Berkeley, CA.
flood heights in hydrology). Given the Jong lead time Barlow R. Eand Singpurawalfa N. (1974)Averaging time and
required to enact legislation, and devise and install maxima for dependent observations. Prof. Symp. on
emission controls in industry and motor vehicles such St~t~sti~&~ Aspects C$ Air @oiity Data. Report No. EPA
time spans are not irrelevant. The concept of the SO- 650/4-74-038. U.S. Environmental Protection Agency.
Research Triangle Park, NC.
year concentration is also important in that it makes Berger A.. Melice J. L. and Demuth C. L. (1982) Statistical
workers in the air pollution area aware that very large distributions of daily and high atmospheric SO1 canccn-
concentrations can still occur despite stringent control trations. Atmospheric ~~ui~o~rne~r 16, 2563.
measures. Gumbel E. J. (1954) Staristical Theory @Extreme Values and
Some Practical Obseruurions. National Bureau of
Standards, Applied Mathematics Series 33, U.S. Govt.
Printing Of&e.
Gumbe E. J. (1958) Srarisrics of Exrremes. Columbia
University Press, Columbia.
Unlike the usual empirical models, used in the air Horowitz J. and Barakat S. (1979) Statistical analysis of
the maximum concentration of an air pollutant: errCcts
pollution area, the extreme value theory model does of autocorrelation and non-stationarity. Atmospheric
not rely on the assumption of a specific frequency ~~~i~ff~~fft t3,81 f-818.
distribution of the initial data. It is a statistical model Larsen R. I. (1969)A new mathematical model olairpollutant
and is predictive into the future, The confidence one concentration averaging time and frequency. .I. Air Po//ur.
Conrrol Ass. 19, 24.
has in such predictions is increasing as longer records
Roberts E. M (1979a) Review of statistin of extreme values
become avaiIable from continuous air monitoring with applications to air quality data-i. Review. J. Air
networks. As the experimental record increases in Poltut. Control Ass. 29, 632.
length, diverse meteorological conditions will be re- Roberts E. M. (1979b) Review of statistics of extreme values
flected in thedata. These meteoroiogi~lconditionsare with applications to air quality data-Jr. Applications. J.
Air Pollut. Control Ass. 29, 733.
controiiing variabfes of the stochastic process which Singpurawalla N. (1972) Extreme values from a fog normal
results in air poWion concentrations. The other Iaw with applications to air polfution problems.
controlling variables in the process are the levels of Technometrics 14, 703.

Вам также может понравиться