You are on page 1of 8

Calculating the Gini coefficient of inequality

Iain Buchan, June 2002 update

Purpose
This is a technical briefing note on the use of the Gini (G) coefficient for summarising
inequality.

Background
The Pan American Health Organisation (2001) promoted a 'Gini-like' statistic as a
summary measure of inequalities in health. UK Public Health Observatories have a
role in monitoring inequalities in health (Department of Health, 2001), and must
therefore consider the statistical properties of inequality measures such as the Gini
coefficient.

Statistical and computing issues

History
The Gini coefficient was developed by the Italian Statistician Corrado Gini (1912) as
a summary measure of income inequality in society. It is usually associated with the
plot of wealth concentration introduced a few years earlier by Max Lorenz (1905).
Since these measures were introduced, they have been applied to topics other than
income and wealth, but mostly within Economics (Cowell, 1995, 2000; Jenkins,
1991; Sen, 1973).

Statistical basis of the Gini coefficient


G is a measure of inequality, defined as the mean of absolute differences between
all pairs of individuals for some measure. The minimum value is 0 when all
measurements are equal and the theoretical maximum is 1 for an infinitely large set
of observations where all measurements but one has a value of 0, which is the
ultimate inequality (Stuart and Ord, 1994).
When G is based on the Lorenz curve of income distribution, it can be interpreted as
the expected income gap between two individuals randomly selected from the
population (Sen, 1973). The Lorenz curve is plotted as the cumulative proportion of
the variable against the cumulative proportion of the sample (i.e. for a sample of 30
observations the cumulative proportion of the sample for the 15th observation is
simply 15/30). To get the cumulative proportion of the variable, first sort the
observations in ascending order and sum the observations, then each kth cumulative
proportion is the sum of all xi/xsum from i=1 to k.

The classical definition of G appears in the notation of the theory of relative mean
difference:
n n
i 1 j 1
xi xj
G 2
2n x
- where x is an observed value, n is the number of values observed and x bar is the
mean value.

If the x values are first placed in ascending order, such that each x has rank i, the
some of the comparisons above can be avoided and computation is quicker:
n
2
G i ( xi x)
n2 x i 1
n
i 1
(2i n 1) xi
G n
n i 1
xi

- where x is an observed value, n is the number of values observed and i is the rank
of values in ascending order.

In order for G to be an unbiased estimate of the true population value, it should be


multiplied by n/(n-1) (Dixon, 1987; Mills and Zandvakili, 1997). This corrected form
of G does not appear most literature, but there are few situations when it is not the
most appropriate form to use.
Constructing confidence intervals for Gini coefficients
The small sample variance properties of G are not known, and large sample
approximations to the variance of G are poor (Mills and Zandvakili, 1997; Glasser,
1962; Dixon et al., 1987), therefore confidence intervals are calculated via bootstrap
re-sampling methods (Efron and Tibshirani, 1997).

Two types of bootstrap confidence intervals are commonly used, these are percentile
and bias-corrected (Mills and Zandvakili, 1997; Dixon et al., 1987; Efron and
Tibshirani, 1997). The bias-corrected intervals are most appropriate for most
applications. Dixon (1987) describes a refinement of the bias-corrected method
known as 'accelerated' - this produces values very closed to conventional bias
corrected intervals.

The percentile confidence interval is defined as:


g * / 2 , g1* /2

- where g* is a Gini coefficient estimated from a bootstrap sample and is (100-


confidence level)/100.

The bias-corrected confidence interval is defined as:


g *p1 , g *p2
p1 2z0 z1 /2

p2 2z0 z1 /2
1 *
z0 # (g G) / k
- where g* is a Gini coefficient estimated from a bootstrap sample, G is the observed
Gini coefficient, is (100-confidence level)/100, is the standard normal distribution
and k is the number of re-samples in the bootstrap.

Gini-like statistics
The original Gini formula is presented in many forms, and there are Gini-like
formulae which approximate the Gini coefficient. In the context of measuring
inequalities in health, Brown (1994) presents a Gini-style index, seemingly calculated
from two variables instead of one. The two variables comprise distinct indicators of
health (y, e.g. infant deaths) and population (x, live births) for n groups sorted by a
composite measure of health and population (e.g. infant mortality rate).
n 1
Gb 1 (Yi 1 Yi )( X i 1 Xi )
i 0

Gb based on two variables (e.g. infant deaths and live births) will be very similar to G
calculated from a composite measure (e.g. infant mortality rate). In most situations it
is more natural to think of inequality of the composite measure. Another reason not
to use Gb is that its statistical characteristics are not well studied

The Pan American Health Organisation (2001) gave the following illustration:

Country GNP per capita infant mortality rate (IMR) live births infant deaths
Bolivia 2860 59 250 14750
Peru 4410 43 621 26703
Ecuador 4730 39 308 12012
Colombia 6720 24 889 21336
Venezuela 8130 22 568 12496

Positive non-zero observations = 5


Bootstrap re-samples = 2000
Bias = 0.057218

Brown's Gb = 0.1904

Gini coefficient = 0.19893


Percentile 95% CI = 0.023645 to 0.219277
Bias-corrected 95% CI = 0.151456 to 0.241304

Unbiased estimator of population Gini coefficient = 0.248663


Percentile 95% CI = 0.029557 to 0.274096
Bias-corrected 95% CI = 0.18932 to 0.30163
Lorenz plot for I nfant M ortality Rate

P roportion of variable
1 .0 0

0 .7 5

0 .5 0

0 .2 5

0 .0 0
0 .0 0 0 .2 5 0 .5 0 0 .7 5 1 .0 0
P roportion of sample

Optimal use of Gini


The more observations or categories that you have the more reliable will be your
calculated G for assessing inequality.

The example above uses too few observations for reliable inference from G, but the
required definition of groups might force this type of situation. G based on few
observations is unreliable for comparing different groups at any one time, but it can
be reasonable for monitoring changes in inequalities over time.

Be aware that G can amplify biases. So, if you are comparing grouped data, make
sure that you are comparing like with like, e.g. equality of readmission rates between
hospitals biased by case-mix differences.

When you are comparing Gini coefficients, particularly over time, note that the Gini
coefficient is insensitive to multiplying all observations by a constant, but it is
sensitive to adding a constant to all observations. An example of this issue could
occur if your were comparing the equality of life-expectancy over time between
geographical areas; here the secular/baseline increase in life expectancy of the
overall population is in effect adding a constant, so there can be a change in Gini
coefficient over time even if the absolute differences in life expectancy between the
areas remain constant.

Note that G is a crude measure of equality that might over-simplify a situation,


particularly where the issue of equality is multi-dimensional. If you are in doubt then
please consult with a Statistician.

Software
I have written a function in StatsDirect to produce bootstrap confidence intervals for
Gini coefficients and Lorenz plots. See http://www.statsdirect.com. I plan to extract
this Gini function into a freely-distributable Excel add-in in the future. It is not
currently practical to put this function into a web-based calculator as it is computer-
intensive, but this will become practical in the future with the growth of processing
power.

There is a Stata macro called ineqerr that will calculate bootstrap confidence
intervals for three different measures of inequality, including Gini. The results need
to be multiplied by n/(n-1) to get unbiased estimates (Dixon, 1987). See
http://www.stata.com.

Dixon (1987) supplies a SAS macro for bootstrapping Gini coefficients from
http://www.public.iastate.edu/~pdixon/sas/. See also http://www.sas.com.

Note that each run of bootstrap usually gives slightly different answers due to the
random re-sampling nature of the method. In order to get consistent bootstrap
estimates, you should select at least 2000 replications when bootstrapping any
software.
References
Brown M, Using Gini-style indices to evaluate the spatial patterns of health
practitioners; theoretical considerations and an application based on the Alberta
data. Social Science and Medicine 1994;38(9):1243-1256.

Cowell FA. Measuring Inequality (second edition, draft third edition (May 2000) at
http://darp.lse.ac.uk/Frankweb/Frank/pdf/measuringinequality2.pdf), Hemel
Hempstead: Harvester Wheatsheaf 1995.

Department of Health, Tackling Inequalities in Health, London: The Stationery Office


2001. http://www.doh.gov.uk/healthinequalities/

Dixon, PM, Weiner J., Mitchell-Olds T, Woodley R. Boot-strapping the Gini


coefficient of inequality. Ecology 1987;68:1548-1551.

Efron B, Tibshirani R. Improvements on cross-validation: The bootstrap method.


Journal of the American Statistical Association 1997;92:548-560.

Gini C. "Variabilitá e mutabilita" 1912 reprinted in Memorie di metodologica statistica


(Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi 1955.

Glasser C. Variance formulas for the mean difference and coefficient of


concentration. Journal of the American Statistical Association 1962;57:648-654.

Le Grand J. Inequalities in health: some international comparisons. European


EconomicReview 1987;31:182-191. Also reprinted in N. Barr (ed) Economic Theory
and the Welfare State Cheltenham: Edward Elgar 2000.

Illsley R, Le Grand J. The measurement of inequality in health. in Health and


Economics (Ed. Williams A) Macmillan 1987.

Jenkins SP, The measurement of economic inequality in Readings on Economic


Inequality (Ed. Osberg L.). New York, Armonk: Sharpe ME 1991.
Lorenz MO. Methods for measuring the concentration of wealth. Journal of the
American Statistical Association 1905;9:209-219.

Mills JA, Zandvakili A. Statistical inference via bootstrapping for measures of


inequality. Journal of Applied Econometrics 1997;12:133-150.

Pan American Health Organisation. Measuring Health Inequalities: Gini Coefficient


and Concentration Index. Epidemiological Bulletin of PAHO 2001;22(1):3-4.
http://www.paho.org/English/SHA/EB_v22n1.pdf

Sen A. On Economic Inequality. Oxford: Clarendon Press 1973.

Stuart A, Ord JK. Kendall's Advanced Theory of Statistics (6th edition). London:
Edward Arnold 1994.