Вы находитесь на странице: 1из 8

J.Y. Koo*, M.J. Yu*, S.G. Kim*, M.H. Shim* and A.

Koizumi**
* Department of Environmental Engineering, University of Seoul, 90, Jeonnong-dong, Dongdaemun-gu,
Seoul, Korea (E-mail: jykoo@uos.ac.kr; myong@uos.ac.kr; sgkim-75@hanmail.net;
blue9842@hotmail.com)
** Department of Environmental Civil Engineering, Tokyo Metropolitan University, Hachioji City, 192-0397
Japan (E-mail: akoiz@ecomp.metro-u.ac.jp)
Abstract In Seoul, the multiple regression models were used to estimate future water demand and to verify
the ability of the water supply to cope with regional development. A regional development project extending
over two districts was planned to stimulate the regional economy of Seoul in October 2003, and multiple
regression models for each district were developed to verify the capacity of water facilities and the retention
time of reservoirs. Two variables, the population and the area of the commercial district, were used to
express domestic and commercial water usage. Coefficients for variables of models should be positive
values; however, the coefficient for population was negative in Jung-gu. The prediction of water demand with
one regression formula for each district may not be sufficient to characterize the water use pattern of a
district. So, by characterizing each sub-district of the two districts, applying principal component and cluster
analysis, they were divided into residential and commercial groups. Then, multiple regression models with
the same variables were developed for each group. As a result, the models not only had positive coefficients
for all variables, but also could provide reasonable sensitivity for the variables. For each group, the
commercial area had nearly same sensitivity, but the population in the commercial area showed more
sensitivity than in the residential area, because people living in the commercial area do not have to go to
another district to work or sleep. Future water demands were estimated, depending on three scenarios of
regional development, using the existing and newly developed models. The water demands estimated by the
newly developed model are 3,41611,372 ton/day less than those by existing model. Therefore, the model
developed gave the correct water demand and prevented a wrong decision from being made.
Keywords Cluster analysis; forecasting water demand; principal component analysis

Water Science and Technology: Water Supply Vol 5 No 1 pp 17 IWA Publishing 2005

Estimating regional water demand in Seoul, South Korea,


using principal component and cluster analysis

Introduction

Water demand has been estimated by using the factors related to water usage. There are
many factors related to water demand, such as population, weather, and water price
(Billings and Jones, 1996). Therefore, a number of factors and methods have been developed to forecast water demand (Reynaud, 2002; Arbues et al., 2003). However, when the
water demand forecast model is selected, it must be considered whether the model is accurate, with adequate variables, or not. From this viewpoint, the multiple regression model,
with two variables: population and the area of the commercial district, has been used in
Seoul. Moreover, until now, this model has been able to forecast the water demand.
However, this model is made in districts and some coefficients in the model would have
unreasonable values if the regions were not divided properly. Therefore, it is necessary to
divide regions in the correct manner, because Seoul is a large city and contains various different regions. The method of dividing the regions by features was first used to some degree
in classifying the towns of England and Wales (Moser and Scott, 1961). Moser and Scott
used principal-component and cluster analysis to divide the towns into groups that have the
same regional features. Then the features of each group were analyzed and extracted for
this purpose after these analyses (Grove and Roberts, 1980). These methods have been also

confirmed by other research, which means that they have methodological and theoretical
validity (Morell et al., 1995). We thought that these methods could be adopted in this study
to forecast water demand reasonably. Therefore, the purposes of this study were the
division of districts, using principal-component and cluster analysis; the development of a
new water demand model; and the estimation of the factors sensitivity and their effect on
forecasting water demand.
J.Y. Koo et al.

Method
Background of this study

Seoul is the capital city of Korea, and has a population of about ten million. Seoul metropolitan city is composed of 21 districts, known as Gu, as shown in Figure 1, and 522 subdistricts, known as Dong. The Han River runs through the center of Seoul. The original
part of the city was built north of the Han River. However, as industrialization in South
Korea has progressed, the city has grown rapidly since the 1960s. Therefore regional development of area south of the Han River has continued since the 1970s. However, compared
to this, the northern part of Han River in Seoul has remained relatively undeveloped region.
For this reason, Seoul Metropolitan Government planned the regional development project
of two district (Jung-gu and Jongro-gu), located north of the Han River. Two districts have
their water supplied from two water facilities and have four reservoirs. Regarding this, it
became essential to forecast the water demand and to verify the capacity of the water supply
before going ahead with the development. The multiple regression model with two variables has been used to forecast the water demand for Seoul (Koo and Joo, 1998). Two variables are the population and the area of the commercial district. One represents the
domestic usage, and the other represents the commercial usage.
Generally, the coefficient of each variable represents the capacity or sensitivity of the
variable. Therefore, the coefficients for two variables must show positive values in the multiple regression model. However, in this case, the coefficient for population in Jung-gu had
a negative value. The reason why it has a negative value is the regional characteristic disparity between the sub-districts, and we also could not find out the correct future water
demand and sensitivities of the factors. We thought that the solution might be regional division based on the characteristics of the sub-districts.
Modeling method

In this study, we used multiple regression, principal-component analysis, and cluster analysis. Principal-component analysis was used to change raw data into statistically reasonable
data, and cluster analysis was used to organize cases. Then, the multiple regression model was
formulated using an organized data-set. The explanations of these methods are as follows.
Multiple regression model

The multiple regression model is often used in many study areas, because it can be easily
modeled using simple assumptions. This model is composed of independent and dependent
variables, and is easily verified, based on three viewpoints. The first is the correctness of
the values predicted by the model. The second is the multi-collinearity between independent factors, and the third is whether the errors in the model have normality or not. Moreover,
the sensitivities of the variables are estimated according to the coefficient values for them
(Goldberg et al., 2003).
Principal component analysis

Principal-component analysis is a mathematical procedure that transforms a number of


possibly correlated variables into a smaller number of uncorrelated variables. Using princi-

J.Y. Koo et al.

(a) Seoul area

(b) Jongno and Jung-gu areas

Figure 1 The Seoul metropolitan city

pal-component analysis, raw data are changed into data that have statistical meaning. Also,
the ability to express raw data is estimated by using the eigenvalue of the principal component. Therefore, the principal components are usually selected by using eigenvalues
(Jolliffe, 2002).
Cluster analysis

Cluster analysis is a useful method for organizing cases. Organization of cases is performed
according to similarity, based on the distance between them in space. There are several
methods to compute the distance in space. Wards method attempts to minimize the sum of
squares of any two clusters that can be formed at each step. It tends to create clusters of
small size. However, in general, this method is regarded as very efficient. Therefore
Wards method was used in this study (Brian et al., 2001).
Results and discussion
Multiple regression model using an existing method

In this section, we will explain why we started this study. Firstly, to estimate the water
demand, a multiple regression for each district was performed using the data for the dongs.
The districts of this regional development project are Jung-gu and Jongro-gu in the center
of northern Seoul. Jung-gu and Jongro-gu contain dongs (sub-districts ) as shown in Table
1. The multiple regression models of Jung-gu and Jongro-gu were formalized, as shown in
Table 2. The coefficient for the population of the Jung-gu model had a negative value,
which is not reasonable. The commercial area has a relatively low population, but has a
higher water demand than the residential area. So, if the regression model is formalized in
an area with mixed commercial and residential uses, the coefficient of population can be
negative. This is the reason why the population could have a negative effect on the result of
regression analysis. We thought that the reason was the disparity among the dongs.
Therefore, principal-component and cluster analysis were adopted to solve the problem.
Area data and principal component analysis

Data have been sought to properly express the features of the district that are connected to
water demand. There were many data sorted by gu, but there were not that many sorted

Table 1 Dongs of Jongro-gu and Jung-gu


District

Dong

Jongro-gu (19)

Chung-un, Hyo-ja, Sa-jik, Sam-chung, Pu-am, Pyung-chang,


Mu-ak, Kyo-nam, Ka-hoe, Jongro 1, 2, 3, 4 ga, Jongro 5, 6 ga, I-hwa, Hye-hwa, Myoungnyun 3 ga, Chang-shin 1, Chang-shin 2, Chang-shin 3, Sung-in 1, Sung-in 2
So-gong, Hoe-hyun, Myong, Pil, Changchung, Kwang-hui,
Ulciro 3, 4, 5 ga, Shin-dang 1, Shin-dang 2, Shin-dang 3,
Shin-dang 4, Shin-dang 5, Shin-dang 6, Hwang-hak, Chung-nim

Jung-gu (15)
J.Y. Koo et al.

Table 2 Formalization of the multiple regression model of Jongro-gu and Jung-gu


Autonomy

R-squared

Estimated equation

Jongro-gu
Jung-gu

0.9585
0.8686

Y = 0.2794X1 + 0.0055X2 131.53


Y = 0.0979X1 + 0.0030X2 + 4767.70

X1: population, X2: area of commercial district (m2)

by dong (sub-district). We could collect only six factors; domestic usage/population


(m3/capita), commercial usage/area of commercial district (m3/m2), employees/population
(capita/capita), population/total area of dong (capita/m2), number of business/total area of
dong (number/m2), and area of commercial district/total area of dong (m2/m2). Then they
are analyzed as the regional features affecting water demand, using the data in the Seoul
Statistical Year Book 2000 (Information division of Seoul Metropolitan Government,
2001). Principal-component analysis was carried out to change the data from the six
factors into statistically reasonable data (principal-component data). The results are presented in Table 3 and Table 4. The cumulative portions of the upper four principal components eigenvalues are 92.6% of the total eigenvalues. It is enough to express the datas
meaning, so the upper four principal components were selected for the following cluster
analysis.

Table 3 The results of the principal-component analysis


Item

1
2
3
4
5
6

Eigenvalue

Difference

Proportion

Cumulative

2.9076
1.1210
0.9445
0.5807
0.2983
0.1479

1.7866
0.1765
0.3638
0.2824
0.1504

0.4846
0.1868
0.1574
0.0968
0.0497
0.0246

0.4846
0.6714
0.8289
0.9256
0.9754
1.0000

Table 4 Principal-component scores for six factors


Item

X1
X2
X3
X4
X5
X6

Prin. 1

Prin. 2

Prin. 3

Prin. 4

0.3405
0.3194
0.4857
0.3134
0.3968
0.5390

0.1515
0.6371
0.2434
0.6294
0.3355
0.0565

0.7457
0.0909
0.2585
0.4102
0.3771
0.2417

0.2730
0.6144
0.1000
0.2954
0.6702
0.0400

X1 = domestic usage/population (m3/capita); X2 = commercial usage/area of commercial district (m3/m2);


X3 = employees/population (capita/capita); X4 = population/total area of dong (capita/m2); X5 = number of
businesses/total area of dong (number/m2); X6 = area of commercial district/total area of dong (m2/m2)

The cluster analysis and the new multiple regression model

J.Y. Koo et al.

In this study, the cluster analysis using four principal components was carried out to
organize the regions.
Of all the methods of cluster analysis, Wards method was used in this study because it
minimizes the loss of data information. The result is as shown in Table 5 and Figure 2. From
the results of the cluster analysis, the dongs were divided into two groups. The number of
dongs in the first group is 24, and that in the second group is 10. The first group is mostly
composed of residential districts, and the second group is composed of commercial districts. Therefore, we found a satisfactory result in the cluster analysis. Also, a multiple
regression model for each group was developed, as shown in Table 6.
All the values of the coefficients for each model are now positive. Therefore, it might be
thought that the irrationality of regression model has been solved. However, the R2 values
of the newly developed model were lower than the existing model, because the variation of
developed model is in a narrower range than that of the existing model. Therefore, it is hard
to say that the forecasting ability of the newly developed model is lower than that of the
existing model.
We could obtain other information: the sensitivity of factors. The sensitivity of the factors affecting water supply can also be expressed with the coefficients of the variables. The
Table 5 Characterization data, using cluster analysis
Group (groups feature)

Group 1
(Residential districts)

Group 2
(Commercial districts)

Dong (code)

Chung-un(1), Hyo-ja(2), Sa-jik(3), Sam-chung(4), Pu-am(5), Pyung-chang(6),


Mu-ak(7), Kyo-nam(8), Ka-hoe(9), I-hwa(12), Hye-hwa(13), Myoung-nyun 3
ga(14), Chang-shin 2(16), Chang-shin 3(17), Sung-in 1(18), So-gong(20),
Pil(23), Chang-chung(24), Shin-dang 2(28), Shin-dang 3(29), Shin-dang 4(30),
Shin-dang 5(31), Shin-dang 6(32), Chung-nim(34)
Jongro 1, 2, 3, 4 ga(10), Jongro 5, 6 ga(11), Chang-shin 1(15), Sung-in 2(19),
Hoe-hyun(21), Myong(22), Kwang-hui(25), Ulciro 3, 4, 5 ga(26), Shin-dang
1(27), Hwang-hak(33)

Figure 2 Dendrogram constructed using cluster analysis

Table 6 Regression models for Jongro-gu and Jung-gu, using the results of principal-component analysis
Group

Group 1
(Residential districts)
Group 2
(Commercial districts)
J.Y. Koo et al.

X1: population, X2

R-squared

Estimated formula

0.7903

Y = 0.1414X1 + 0.0054X2 + 1145.20

0.8660

Y = 0.3785X1 + 0.0041X2 + 491.96

: area of commercial district (m2)

population effect in the area of the commercial district is about 2.7 times larger than in the
residential district. The people in the offices of the commercial district do not have to move
to another district. They work and sleep in the same district, so it is guessed that they use
more water than people living in the residential district.
The comparison of existing and developed models

There are three regional development scenarios for Jongro-gu and Jung-gu: active, moderate, and passive plans. The water demands of the study area for 2011, as estimated by the
existing and developed models, are shown in Figure 3.
Future water demand as predicted by the newly developed model was increased by
11,372 m3/day for the passive regional development project, 8,255 m3/day for the moderate
regional development project, and 3,416 m3/day for the active regional development project. As the maximum operating ratio of the water facilities in summer could often become
very high, they could affect decisions regarding the future water supply and distribution
system. Therefore, the predictions made by the newly developed model produce a meaningful result.

200 000

Fu ture wa ter dem a nd( m 3 / da y)

Existing model

Developed model

180 000

160 000

140 000

120 000

100 000

Passive regional
development project
6

Moderate regional
development project

Active regional
development project

Figure 3 Water demand estimated by the existing and developed models for 2011

Conclusions

J.Y. Koo et al.

When regional development projects were planned in Seoul, the multiple regression model
of each district, with two variables, was used to forecast the water demand and to verify the
ability of the water supply to cope with demand. However, the population coefficient in the
model had a negative value, and this was not reasonable. The reason for this was that the
division of the districts was not done properly. Therefore, we used the principal-component
and cluster analysis to reorganize the region more reasonably, the region being divided into
two groups according to the results with, the multiple regression models for each group
being developed, giving the following results.
1. In the newly developed models, all the coefficients appeared to give positive values.
Therefore, obtained statistically reasonable models, even if the R-squared values of the
newly developed model were lower than those of existing model.
2. The sensitivity of the factors was estimated by the variables coefficients in the model.
There was no significant difference between the groups for the case of commercial districts. However, there was a significant difference (about 2.7 times) for the case of population.
3. According to three regional development scenarios for Jongro-gu and Jung-gu: i.e.
active, moderate, and passive regional development, the future water demand was
found to increase. Therefore, the developed model gave the correct water demand figures, and prevented the probability of a wrong decision.
Acknowledgements

This study was supported by Grant (code 4-2-1) from the Sustainable Water Resources
Research Center of the 21st Century Frontier Research Program.
References
Arbues, F., Garca-Valinas, M.A. and Martinez-Espineira, R. (2003). Estimation of residential water
demand: a state-of-the-art review. Journal of Socio-Economics, 32, 81102.
Billings, R.B. and Jones, C.B. (1996). Forecasting Urban Water Demand. American Water Works
Association, Denver, Colorado.
Brian, S.E., Sabine, L. and Morven, S. (2001). Cluster Analysis, 4th edn, Edward Arnold, London.
Goldberg, M.A. and Cho, H.A. (2003). Introduction to Regression Analysis, Springer Verlag, Heidelberg.
Grove, D.M. and Roberts, C.A. (1980). Principal component and cluster analysis of 185 large towns in
England and Wales. Urban Studies, 17, 7782.
Information division of Seoul Metropolitan Government (2001). Seoul Statistical Yearbook 2000. Seoul
Metropolitan Government, Seoul, Korea.
Jolliffe, I.T. (2002). Principal Component Analysis, 2nd edn, Springer Verlag, Heidelberg.
Koo, J.Y. and Joo, C.N. (1998). Analysis of Infrastructure Capacity of Seoul. Institute of Urban Sciences,
University of Seoul, Seoul, Korea.
Morell, I., Gimnez, E., and Esteller, M.V. (1995). Application of principal components analysis to the
study of salinization on the Castellon Plain(Spain). Science of the Total Environment, 177, 161171.
Moser, C.A. and Scott, W. (1961). British Towns. Oliver and Boyd, London.
Reynaud, A. (2002). An econometric estimation of industrial water demand in France. Environmental and
Resource Economics, 25, 213232.

Вам также может понравиться