Академический Документы
Профессиональный Документы
Культура Документы
Paper presented at 'The Census of Population: 2000 and beyond', Manchester, June
22-23rd 2000.
1. Project Overview
This project sets out to explore the role of place in ‘determining’ income. This has
been made possible only by the completion of the recent large-scale Census
Rehearsal, which included a question on income. As a result, for the first time in the
UK a dataset exists that captures the income of individual located within spatially
contiguous households.
The results from the project are to be fed back to the Census Offices and other users
to better inform strategies for estimating (and imputing) individual, household and
small-area level incomes
2. Paper Overview
This paper offers a few preliminary thoughts on the problem of estimating spatially
detailed patterns of income distribution. A major and unforeseen delay in the
production of a dataset potentially holding the key to this problem necessarily means
that these thoughts are rather more preliminary than initially anticipated. Indeed, at
the time of writing, the planned research on which this report was to have been based
has been formally postponed until July 2000, pending data availability. This interim
report, therefore, serves primarily to outline the planned future research. However, it
is hopefully also represents a useful and concise summary of the state-of-the-art in
terms of small area income imputation in the UK. To this end, the report draws
together three separate documents: a) An outline of the planned research into income
imputation, highlighting the potential significance of having access for the first time
to spatially detailed Census Rehearsal data; b) A brief review of the main alternative
approaches to income imputation in current use; c) Some initial thoughts on the
potential utility of council tax data for improving current income estimates.
3. The imputation of income: an outline of planned research
3.1 Overview
A strong case has been made by a wide constituency of user groups for the inclusion
of an income question in the 2001 Census. Nonetheless, the Census Office retains a
number of legitimate concerns over the wisdom of such a course of action. In the light
of these arguments, both for and against the inclusion of an income question in the
census, the current government white paper on the 2001 Census (Cm 4253) has
proposed that an income question should be trialled in the 1999 Census Rehearsal. A
firm decision regarding the inclusion of income in the 2001 Census will be made in
the light of results from this trial.
It is proposed here to take advantage of the unique nature of the data collected by this
1999 Census Rehearsal. For the first time, a UK dataset will exist that captures the
income of individuals located within spatially contiguous households for a large
number of enumeration districts. Analysis of these data will allow extensive
exploration of the role of place in ‘determining’ income. In particular, analysis of
these data will permit three key objectives to be met:
1. An evaluation of extant and currently proposed methods of small area income
imputation.
2. An evaluation of the use of other Census based measures as proxies for
income.
3. An assessment of the effectiveness of incorporating non-Census information in
small area income imputation process, looking in particular at the role of house
prices as measured by council tax bands.
Irrespective of the decision eventually reached by the government, the fruits of this
research will be of use, both to the Census Office and to the wider user community.
The main outcome of the proposed research will be the identification of the best
possible strategy, dependent upon data availability, for imputing small area income. In
an income question is included in the 2001 Census, the research proposed here will
offer the Census Office the opportunity to be involved in the evaluation of a wider
range of item imputation strategies for income that would otherwise have been
possible. If an income question is not included, then the Census Office will be able to
demonstrate that they have actively collaborated in meeting the government’s stated
preferred aim of ‘identify[ing] possible alternative means of securing relevant
information’ (Cmd Cm 4253, para. 103), by helping to improve small area income
imputation methodology. This would also contribute directly to ongoing efforts
elsewhere within the Office for National Statistics to develop small area income
estimates. In either case the benefit to the wider user community will be an
improvement over the current situation with regards to the availability of accurate
estimates of small area income distributions.
Consultations before the 1991 Census and again in the run up to the 2001 Census
have shown almost overwhelming support for the inclusion of an income question in
the Census. (Rees, 1998; Cmd 4253). As Dorling (1999) has argued elsewhere, it has
become increasingly recognised that inequalities in income distribution appear to
underpin a wide range of social phenomena, from voting and leisure habits, through to
long term health prospects and, ultimately, life expectancy. That this movement in
opinion has not been confined solely to academic circles was perhaps demonstrated
most graphically in the recent white paper ‘Our Healthier Nation’, which set out the
government’s intent "to improve the health of the worst off in society and to narrow
the health gap". (Cm 3852, p5, my emphasis).
Set against these strong arguments for the inclusion of an income question in the
forthcoming 2001 Census, it needs to be recognised that there remain legitimate
concerns that have yet to be fully addressed. These include the likely accuracy of any
data collected due to mis-reporting, the possibility of substantial non-response to an
income question, in tandem with differential non-response bias and, worst of all, the
possibility of reduced overall enumeration rates. Even in the potential reduction in
overall response rates remains small, the danger is non-response would rise most
amongst those groups of people who are already the hardest to enumerate. Only a
careful assessment of the outcome of the 1999 Census Rehearsal can inform the final
government decision on this question.
Over time a wide range of methods have been adopted to impute income for small
areas. These methods can be divided into two main camps: those that seek to impute
aggregate measures of income for small geographic areas, and those that seek in
addition to impute measures of income for individual people and households within
those same areas.
More generally, the single most common way in which the lack of information on
income within small areas is dealt with is to refer to census variables seen as plausible
substitutes for income. By common consensus these include car ownership, the
occupation based measure of ‘social class’, employment status and household tenure.
These indicators, either singly or in combination, have been used to underpin many
studies into the impacts of (income) inequalities. Multiple ‘deprivation’ indices in
common use both within government and academia, include the Townsend index and
the Department of Environment Index of Local Conditions. The problem with such
studies is that, being based upon indirect measures of wealth, interpretation of results
is not always straightforward. For example, having controlled for age and education,
does the observed relationship between Social Class and health reflect income (or the
lack of it) or social standing/status within the community?
A third alternative is to impute income data for individuals and household records in
publicly available population microdata. Williamson (forthcoming) GSS (1997) and
David et al. (1986) review a range of imputation strategies that could be adopted in
this case, ranging from sub-group means through regression to neural networks and
donor imputation. As present the Census Office in proposing to impute all missing
variables in the 2001 Census using a revised version of the ‘Hot Deck’ imputation
system adopted during the 1991 Census (Vickers and Yar, 1998). As a result, research
is currently underway to fine-tune the performance of this methodology for each
proposed variable in the 2001 Census, including income.
Outside of government, Dale et al. (1995) have imputed the income to individuals in
the SAR, albeit for relatively large geographical areas, using mean earnings for
population sub-groups derived from the New Earnings Survey. This deterministic
type of approach ensures that the overall sub-group mean income will be correct, but
also leads to reduced within-group income variance and distorts relationships with
income for variables not considered in the defining imputation sub-groups. Bramley
and Lancaster (1998) report the imputation of income for individuals and households
contained in geographically detailed population microdata released from the Scottish
House Condition Survey. Again income was imputed separately for a range of
population sub-groups, but in this case a stochastic element was retained, ensuring
that not only the mean but also the overall sub-group income distributions were
retained. Geographically detailed microdata are not available for England and Wales.
Therefore, Williamson (1995) first estimated the requisite small-area population
microdata before also stochastically imputing individual and household income.
Crucially, in all three of the examples above, the lack of a direct measure of income
for small areas made a thorough evaluation of the estimates produced impossible.
None of these features by themselves are unique amongst government surveys, but in
combination they represent an entirely new dataset. These data offer the opportunity
to answer a number of key questions, the single most important of which is
identifying the spatial scale at which variations in income operate. In other words,
they offer the opportunity to quantify the importance of place in determining income.
Do individuals with similar key characteristics (such as age and occupation) have
similar incomes, irrespective of location (having taken account of known regional pay
differentials), or does place matter? The same data allow the answering of a second
key question: how the income of individuals/households in small areas should best be
estimated.
The quality of the data collected in the 1999 Census Rehearsal is, at the time of
writing, an unknown. However, it can be anticipated that it will have a number of key
characteristics:
• under-enumeration (due both to non-response and missed households)
• some degree of mis-reporting of income, both deliberate and accidental,
particularly for sub-groups of the population such as the self-employed pensioners
• increased levels of item non-response for certain questions, but especially for
income
These shortcomings should not be denied, but nor should they be over-played. The
collected data can be expected to have a fairly high response rate (well in excess of
50%, on average), and the vast majority of responding households (well in excess of
80%), can be expected to have completed the full test Census forms, including the
question on income. These data are entirely adequate to evaluate the importance of
place in determining income, and for evaluating alternative income imputation
schemes. However, they are unlikely to be of sufficient quality to provide accurate
estimates of the income for smaller geographic areas, especially for those ‘hard to
count’ areas with the lowest response rates.
The research proposed here has three main aims, set within the broader framework of
exploring the importance of place in determining individual and household incomes:
• To evaluate currently existing and proposed methods of income imputation
• To evaluate the use of other Census based measures as proxies for income
• To evaluate the incorporation of other, non-Census, information in income
imputation
Section 3.3 above reviewed the main methods of small-area income imputation that
have been used in the recent past, or are proposed for use in the near future. Using
each of these methodologies in turn, it is proposed to evaluate the effectiveness of
imputation on the Census Rehearsal dataset, by attempting to impute the income for
individuals with known income and, crucially, spatial location.
In so far as it is possible, given the data quality issues discussed above, it is proposed
to compare small based estimates of income with proxy measures, such as % car
ownership, tenure, social class and so on. Multivariate measures of deprivation
commonly used as proxies for poverty, geodemographic profiles and the Townsend
and Doe indices will also be compared to actual observed levels of income. It is
recognised that this goal, although in many ways the simplest to perform, is the one
most likely to prove problematic, due to the likely nature of data quality issues with
the Census Rehearsal discussed above. Nonetheless, the aim will be to obtain at least
some firm indications of the value of each income proxy.
It is possible that, having controlled for regional pay differentials and relevant
individual and household census characteristics, spatial variations in income still
persist. This would suggest that certain variables not captured in the Census
contribute to ‘determining’ income. Such variables might include house price, rents
and local environment (proximity to major roads or open spaces). It is proposed to
augment the most successful income imputation strategy identified in Section 4.1
above by incorporating some of these additional variables. House prices can be
estimated by reference to council tax band for individual households, which are
generally readily available, although not necessarily in machine-readable form. Other
non-census variables will be incorporated as proves practicable, depending upon the
type of information that can most readily be elicited from local sources (particularly
relevant local authorities). Time and resource limits necessarily preclude the use of
specially commissioned surveys.
3.6 Timetable for research
Data availability permitting, the revised start date for the research outlined above is 1st
July 2000, with a finish date of 30th April 2001.
The ONS has investigated using data from the Department of Social Security (DSS)
on income-related benefits (IRBs) to identify areas with a high proportion of people
on low incomes. Although useful information may be provided in this way, the
method has a number of important disadvantages. Benefit data tell us nothing about
those who are over the income thresholds. Moreover the records tell us relatively
little about recipients apart from sex and age group. The association between IRBs
and low incomes is less straightforward than might appear, in particular because
eligibility is not based solely on income, and furthermore take-up of benefits varies
between groups and areas. This approach is therefore at best supplementary.
John Stillwell and colleagues at the University of Leeds have developed a method of
decomposing the Gross Domestic Product to district or ward level, thus deriving area-
specific GDP per capita. The Annual Employment Survey was used to break down
local employment into broad sectors (manufacturing, construction, etc.), to which
statistics on GDP by sector could be applied. The resulting estimates were scaled up
or down to correspond to regional GDP (published in Regional Trends). In effect,
then, the value for any given area depends only on its sectoral composition and
regional context. While the figures are interesting as a measure of wealth creation
within particular areas, they tell us little about the personal incomes of local residents.
4.3 Modelling using national survey data
ONS researchers have developed models to estimate income at ED level on the basis
of data from the census and national surveys, of which the Family Resources Survey
(FRS) has been the most important. This project was greatly assisted by having
access to the exact location of each respondent, which is not available to most users.
A multi-level approach was used, with some of the relevant variables being at the area
(rather than individual) level. There are still difficulties with the method, however.
In particular, it does not capture the extent of variability at local level. The model will
inevitably be incomplete, and results will be biased by unmeasured area effects.
Glen Bramley and colleagues have developed models for small area income
estimation based on dividing the households in the local populations into subgroups,
and on assuming that income in each subgroup has a lognormal distribution. The
parameters of these local distributions are linear functions of the parameters at
national level. The subgroups relate to household composition (e.g. number of adults
and children), to the number of earners, and optionally to tenure. Once again it seems
that the models may not capture the full extent of inter-area variation. It is also
difficult to quantify the relative impact of context and composition because the local
weights used to adjust the parameters are affected by both.
Some of the key questions not fully answered by any of the approaches outlined
above are:
• What do small-area income distributions really look like?
• How are these distributions affected by place, as opposed to composition?
• To what extent do area-level effects interact with individual or household effects?
• At what spatial scale do the effects operate (e.g. region, district, ward, ED)?
• What is the direction of causation?
• How far can these effects be captured using measures derived from the census
alone?
• How useful are supplementary data on place?
A further question is the extent to which other data can be used to help improve the
accuracy of small-area income estimation. The most immediately obvious additional
information as yet unused in any approach is house value. This information is
captured, if somewhat crudely, by the valuation band into which each property in
Britain has been assigned for the purpose of assessing liability to a local authority
administered tax known as the council tax. The final section of this paper offers a few
preliminary thoughts on the apparent strengths and weaknesses as an additional
source of information for modelling small-area incomes.
Figures 1 and 2 map the distribution of households in council tax bands at the bottom
(A&B) and top (G&H) of the council tax bands in 2000, by postcode unit, on to a
1991 census-based measure of unemployment by enumeration district. These maps
highlight a number of features. First, there is a clear link between high levels of
unemployment and low council tax banding. Second, the link between high council
tax banding and unemployment is perhaps less clear-cut. Third, the spatial resolution
afforded by postcode units is far higher than that offered by enumeration districts.
There are some 2200 postcode units in an area containing 226 EDs. Fourth, although a
majority of postcode units are entirely homogeneous with respect to council tax
banding (see Figure 3), the same cannot be said for enumeration districts. This point
is perhaps better illustrated by Figures 4 and 5, which plot average council tax band
(where A=1 and H=8) by postcode unit onto 1991 Census enumeration districts. A
majority of enumeration districts contain postcode units with differing average
council tax bands. Figure 5, a blow-up of part of Figure 4, reinforces this impression
and clearly identifies the fact that even within high unemployment enumeration
districts there are postcode units with high council tax band averages.
So where does this leave us? The evidence presented suggests that council tax
banding could indeed provide additional information above and beyond that captured
by ED level aggregate measures. In modelling small-area incomes, therefore, council
tax banding might help to add further information to that captured by ward or ED
level non-census data such as benefit uptake rates. What remains unclear is the extent
to which council tax banding might help in the imputation of income at the household
or individual level. At the bottom end of the scale, the answer would appear to be not
much. The ubiquity of band A and B households does not appear to offer much
chance of differentiation between poor and very poor households (see Table 3).
However, there is at least a suggestion that the use of information of the location of
the top-banded households might well help in the identification of 'rich' and 'super-
rich' households.
Acknowledgements
The research reported here has been funded by the Economic and Social Research
Council as part of its Census Development Programme (Award no. H507 25 5166).
References
Benefits data X
Census of population X X X X X
Modelling X X X
Distributional assumptions X
Commercial data X
Geodemographic classes X
Table 2 Council tax bands by Local Authority
District/UA County % of
properties
Easington Durham 85
Wansbeck Northumberland 76
Derwentside Durham 75
Kingston-upon-Hull Humberside 74
Scunthormpe Humberside 74
Sedgefield Durham 73
Manchester Greater Manchester 71
Cumnock Strathclyde 70
Sunderland Tyne and Wear 70
Wear Valley Durham 69
% P r o p e r t i e s i n b a n d s A & B
( P o s t c o d e u n i t s )
0 . 0 5 0 0 . 01 00 0 . 0 0
% U n e m p l o y m e n t 1 9 9 1
( E n u m e r a t i o n D i s t r i c t s )
2 7 t o 7 3
2 0 t o 2 7
1 4 t o 2 0
9 t o 1 4
0 t o 9
% P r o p e r t i e s i n b a n d s G & H
( P o s t c o d e u n i t s )
0 . 0 5 0 0 . 01 00 0 . 0 0
% U n e m p l o y m e n t 1 9 9 1
( E n u m e r a t i o n D i s t r i c t s )
2 7 t o 7 3
2 0 t o 2 7
1 4 t o 2 0
9 t o 1 4
0 t o 9
1000
800
600
400
0 N = 2215.00
2.
10
18
26
34
42
50
58
66
74
83
91
99
0
.1
.2
.3
.4
.5
.6
.7
.8
.9
.0
.1
.2
SAE as % of max. possible value
Figure 3 The concentration of council tax bands
Deviation from an even distribution
(As measured by SAE)
1200
1000
800
600
400
0 N = 2215.00
.04 .38 .72 1.06 1.40 1.74
.21 .55 .89 1.23 1.57
A v e r a g e C o u n c i l T a x B a n d
( P o s t c o d e u n i t s )
0 . 0 40 . 0 80 . 0 0
% U n e m p l o y m e n t 1 9 9 1
( E n u m e r a t i o n D i s t r i c t s )
2 7 t o 7 3
2 0 t o 2 7
1 4 t o 2 0
9 t o 1 4
0 t o 9
y m e n t 1 9 9 1
Figure 5 Close-up of Figure 4 (part)
o n D i s t r i c t s )