Вы находитесь на странице: 1из 11

DEA Cross-efficiency in the R program

José Francisco Moreira Pessanha


Rio de Janeiro State University – UERJ
professorjfmp@hotmail.com

Alexandre Marinho
Institute for Applied Economic Research – IPEA and Rio de Janeiro State University – UERJ
alexandre.marinho@ipea.gov.br

Sonia Maria de Rezende


Rio de Janeiro State University – UERJ
profsoniauerj@gmail.com

Luiz da Costa Laurencel


Rio de Janeiro State University – UERJ
llaurenc.ntg@terra.com.br

Marcelo Rubens dos Santos do Amaral


Rio de Janeiro State University – UERJ
mrubens@ime.uerj.br

Abstract
This paper aims to present an implementation of the DEA cross-efficiency models in the
environment R. R is a free software and open source, highly extensible that offers a variety of
functions and graphical routines for data analysis. We develop R codes for both formulations
aggressive and benevolent of the DEA cross-efficiency models. In order to illustrate a
practical application of the R codes we use inputs and outputs from Brazilian electricity
distribution utilities.
Keywords: Data Envelopment Analysis, Cross-efficiency, R programming language

1
1 Introduction
The classical DEA models assigns weights to inputs and outputs variables in order to optimize
the efficiency score of a decision making unit (DMU). The model assigns higher weights to
positive points of the evaluated DMU. In other words, classical DEA models carry out self-
appraisal evaluation. Consequently, the weights assigned to the variables can vary widely
between the DMUs. This imbalance can result in null weights or lead to inconsistent weights
schemes when there is some a priori knowledge about the importance of the input and output
variables (RUIZ; SIRVENT, 2012). One way to avoid null and unrealistic weights schemes is
to include weight constraints on DEA model (BOGETOFT; OTTO, 2011). However, this
approach involves some arbitrariness and depends on a priori information of the relative
importance of each variable, but unfortunately in the most cases this information is not
available.

An alternative to the imposition of weights constraints is the cross efficiency analysis


(SEXTON et al, 1986). Different from the classical DEA models, the cross evaluation carry
out a peer appraisal evaluation, i.e., the efficiency of each DMU is evaluated according to the
optimal weights schemes of other DMUs, that is, the efficiency scores of a DMU is evaluated
from the point of view of other DMUs.

In this paper we show R (R DEVELOPMENT CORE TEAM, 2014) codes for both
formulations aggressive and benevolent of the DEA cross-efficiency models. In order to
illustrate the R codes we use inputs and outputs from Brazilian electricity distribution utilities.
R is a free software and open source. In fact R, allows analysts to build their own programs or
packages and distribute them. Thus, using the R program analysts can obtain low-cost
solutions. The standard DEA models and many extensions can be found inside R in the
package Benchmarking (BOGETOFT; OTTO, 2011). By using the lpSolve (BUTTREY,
2015), a mixed integer linear programming fully integrated with R and callable from R for
free, one can afford to develop new DEA models. Commercial or freeware programs are
practical and contain many templates and resources to facilitate the implementation of the
DEA. But they do not allow the directly inspection of the details of the problem under
analysis. Moreover, unlike the R, most free programs are limited to include few DMUs, or by
the quantities of inputs and outputs they can deal with, or by the available models they run. A
newcomer to DEA modelling can get strongly tied or “addicted” to some expensive
commercial software or even to some free package, because they are friendly and easy to use.
But, on the long run, this “addiction” might come to a high cost because one can even loose
2
an in-depth notion about the DEA models. By losing this ability the individual may never be
able, in the future, to develop its own new models or to implement tailored solutions for them.

2 Classical DEA models


DEA is a widely used technique for evaluating the efficiency of a set with N peer (DMU)
which convert multiples inputs into multiples outputs. In the general case, a DMU uses
multiples inputs X=(x1,...,xs) to produce multiples outputs Y=(y1,...,ym) and its efficiency score
is defined by the following quotient:

efficiency = (u1y1+…+umym)/(v1x1+…+vsxs) = UY/VX (1)

where V=(v1,...,vs) and U=(u1,...,um) denote the weights assigned to the inputs and outputs
quantities respectively.

Charnes et al (1978) suggest that the vectors U and V must be determined by the linear
programming problem (LPP) (2) at Table 1, called CRS (Constant Returns to Scale) input
oriented in the multiplier form. The DMUj0 is fully efficient if =1 and all weights are
positive at the optimal solution. If =1 but some weights are equal to zero the DMUj0 is
weakly efficient. Otherwise, if <1 the DMU is inefficient (COOK; ZHU, 2005).

Table 1 DEA/CRS input oriented


m
efficiency  Max  ui yi , j0 (2) efficiency  Min  (3)
u ,v  ,
i 1
s.t. s.t.
N
X j    j X j
s m
  vi xij   ui yij  0 j  1,..., j0 ,..., N 0
j 1
i 1 i 1
N
Y j0    j Y j
s

v x
i 1
i i , j0 1
j 1

ui  0, v j  0, i  1,m, j  1, s  j  0j  1,..., j0 ,..., N

Under the resources conservation approach (input orientation), the measure of technical
efficiency  (0 1) of a DMU is defined as the maximum radial contraction of the input
vector X that can produce the same amount of products Y:

efficiency = Min { | (X,Y)  production possibilities set T(X,Y) } (4)

Through the duality theory we obtain the DEA model in the envelopment form under input
orientation whose mathematical formulation corresponds to the model (3) at Table 1. In this
case, the DMUj0 is fully efficient if and only if =1 and all slack variables are equal to zero. If

3
=1 but some slack variables are positive the DMUj0 is weakly efficient. Otherwise, the DMU
is inefficient. It should be emphasized that the LPP (2) or (3) must be solved for each DMU in
order to compute its efficiency score.

3 Cross-efficiency analysis
The cross efficiency of a DMU s based on the weights of a DMU k é defined by the following
quotient where ujk and vjk are the optimal weights of the DMU k applied on the outputs yjs and
inputs xjs of DMU s respectively:
outputs inputs
(5)
E ks   uik yis  v jk x js
i j

In a set with N DMU, the efficiency scores computed by the CRS model and the cross
efficiency can be disposed in the cross efficiency matrix (Table 2). The efficiency scores from
CRS models are arranged in the diagonal. The k-th line show the cross efficiency computed
with weights of DMU k, while in the k-th column are disposed the cross efficiency of the k-th
DMU computed with the weights of the others DMU.

Table 2 Cross efficiency matrix


DMU 1 2 3 ... k ... N
1 E11 E12 E13 ... E1k ... E1N
2 E21 E22 E23 ... E2k ... E2N
... ... ... ... ... ... ... ...
k Ek1 Ek2 Ek3 ... Ekk ... EkN
... ... ... ... ... ... ... ...
N EN1 EN2 EN3 ... ENk ... ENN

The CRS DEA model can have multiple optimal solutions (weights) associated with the same
level of efficiency. Therefore, the cross efficiencies based on weights obtained by this model
would be generated arbitrarily (LIANG et al, 2008). To overcome this inconveniency, some
additional secondary criteria have been proposed to select the weights set among the several
solutions (RUIZ, SIRVENT, 2012; WANG, CHIN, 2010). For example, the aggressive and
the benevolent formulations proposed by Doyle and Green (1994). In the benevolent way we
aim to find weights that maximize the cross efficiency of each DMU while in the aggressive
model the weights should minimize the cross-efficiency. The aggressive formulation
improves the discrimination among the efficiency scores. In this case the cross-efficiency
score of a DMU s on point of view of DMU k (Eks) is based on weights u and v from the LPP
(6) in Table 3 where Ekk is the efficiency score of DMU k based on the CRS model. The

4
benevolent formulation is illustrated by the LPP (7) in Table 3 (ESTELLITA LINS;
ANGULO-MEZA, 2002).

Table 3 Aggressive and benevolent formulations


outputs inputs outputs inputs
Min
u ,v
u  y
i
ik
sk
is  v  x
j
jk
sk
js Max
u ,v
u  y
i
ik
sk
is  v  x
j
jk
sk
js
(6) (7)
s.t. s.t.
inputs inputs

 j
v jk x jk  1 vj
jk x jk  1

outputs inputs outputs inputs

 i
uik yik  Ekk  j
v jk x jk  0 i
uik yik  Ekk v j
jk x jk  0

outputs inputs outputs inputs

ui
ik yik  v j
jk x jk  0, s  k u
i
ik yik  v j
jk x jk  0, s  k

uik , vik  0 uik , vik  0

The efficiency score of the DMU k is the mean value in the k–th column of the cross
efficiency matrix without the self–efficiency (8). In addition, Doyle and Green (1994)
proposed the Maverick index Mk (9) to identify DMU with unrealistic weights set. The Mk
index for a DMU k is the deviation between the self-efficiency score (Ekk) and the cross-
efficiency score (ek). A high value for Mk suggests that the weight scheme is unrealistic. The
DMUs with Mk greater than 20% are called Mavericks. A DMU with self-efficiency equal to 1
but high Mk is a false positive.

1
ek   Eik
N  1 i k (8)
Mk=(Ekk-ek)/ek (9)

4 An R code for cross DEA


In order to illustrate the R code suitable to solve the problems described above, we consider
the data from Brazilian electricity distribution utilities available on site of the Brazilian
Electricity Regulatory Agency – Aneel (www.aneel.gov.br). The dataset has been used in the
discussions about the tariff review methodology in Brazil. We emphasize that DEA has been
used by regulatory agencies and evaluation activities around the world (MARINHO et al,
1997; JASMAB, POLLITT, 2000; PESSANHA et al, 2010; MARINHO, CASTANHEIRA

5
JÚNIOR, 2011; REZENDE et al, 2014). The R code in Table 4 downloads and extracts the
datafile. The datafile has 61 rows (utilities) and 11 columns (variables).

Table 4 R code to download and extract data file of Brazilian electric distribution utilities
library(downloader) ; library(xlsx) ; library(utils) # load packages
dir.create("c:/CARA"); setwd("c:/CARA") # set work directory
url = "http://www.aneel.gov.br/aplicacoes/audiencia/arquivo/2014/023/documento/custos_operacionais_-_atualizada.zip"
file="OPEX.zip" ; download(url,file,mode="wb") # download file
unzip(file,files=NULL,list=FALSE,overwrite=TRUE,junkpaths=TRUE,exdir=".",unzip="internal",setTimes=FALSE)
data=read.xlsx("Base DEA - atualizada.xlsx",header=TRUE,startRow=2, sheetIndex=1) # read spreadsheet
utility=strsplit(as.character(data[,1]),"_m") # utility’s names
opex=data[,4] # average operational expenditure in the period 2011-2013
customers=data[,8] # average number of customers in the period 2011-2013
energysale=data[,9] # average energy sales in the period 2011-2013
network=data[,5]+data[,6]+data[,7] # total length network (underground + overhead lines + high voltage)
datamatrix=cbind(opex,customers,energysale,network) # data matrix
rownames(datamatrix)=utility

The regulatory agency aims assess the efficient level of the operational expenditure (opex) of
each utility. Given the market size served by a utility, it must operate at least cost defined by
the efficient frontier. The deviation between the actual opex and the least cost is assigned to
the inefficiency in the utility management. Based on the efficiency score, the regulatory
agency penalizes inefficiencies when set the tariff level for the utilities. Then, we have a DEA
input oriented model in which opex is the only input variable (s=1). The output variables in
the DEA model are the drivers of opex: numbers of customers, total energy sales and
distribution network length (APPA et al, 2010), then three outputs (m=3).

DEA is based on the assumption of comparability among the DMU. However the Brazilian
distribution utilities are heterogeneous. In order to establish fair comparisons we applied the
Ward method (JOHNSON; WICHERN, 1998) to classify the 61 utilities in two groups: large
and small utilities. The cluster analysis was applied to the logarithm of the output variables.
The R code and the results from Ward method are illustrated in Figure 1 where the
dendrogram shows that the two clusters are well separated. The cluster with the large utilities
concentrates 97% of the total energy sales. In this paper the DEA models take into account
only the large utilities.

The efficient frontier’s shape depends on the assumptions about the return to scale. For
purposes of incentive regulation the frontier should not allow decreasing return to scale, then
DEA CRS model is a good choice (APPA et al, 2010). This choice also satisfies the requisites
of the cross-efficiency (SOARES de MELLO et al, 2013).

6
First, we run the CRS model in order to obtain the self-efficiency scores Ekk. The self-
efficiency scores can be computed by the R code in Table 5 (PESSANHA et al, 2015), where
i is the index of the evaluated DMU. The LPP can be solved by lpSolve package (BUTTREY,
2005):

# cluster analysis
results = hclust(dist(log(datamatrix[,c(2,3,4)])),method="ward.D")
plot(results) # dendrogram
members = cutree(results,k=2) # 2 clusters
datamatrix=data.frame(cbind(datamatrix,members))
attach(datamatrix)
#selected DMU classified in cluster 1 (large utilities)
index=which(members==1)
data_dea = datamatrix[index,]
N = dim(data_dea)[1] # number of DMU
s = 1 # number of inputs
m = 3 # number of outputs
inputs = data_dea[,1]
outputs = data_dea[,c(2,3,4)]
Figure 1. Cluster analysis

Table 5 R code to compute DEA/CRS efficiency scores


library(lpSolve) # load lpSolve package previously installed
f.rhs = c(rep(0,N),1) # RHS constraints
f.dir = c(rep("<=",N),"=") # directions of the constraints
aux = cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj = c(rep(0,s),t(data_dea[i,c(2,3,4)])) # objective function coefficients
f.con = rbind(aux ,c(data_dea[i,1], rep(0,m))) # add LHS
results = lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers = results$solution # input and output weights
efficiency = results$objval # efficiency score
duals = results$duals # shadow prices
if (i==1) {
weights = c(multipliers[seq(1,s+m)])
effcrs = efficiency
lambdas = duals [seq(1,N)]
} else {
weights = rbind(weights,c(multipliers[seq(1,s+m)]))
effcrs = rbind(effcrs , efficiency)
lambdas = rbind(lambdas,duals[seq(1,N)])
}
}
matrix_results = cbind(effcrs,weights,lambdas)
rownames(matrix_results) = rownames(data_dea)
colnames(matrix_results) = c("efficiency",colnames(data_dea)[1:(s+m)], rownames(data_dea))
rownames(matrix_results) = rownames(data_dea)

Based on the CRS scores we can apply the R code in Table 6 to compute cross-efficiency
under the aggressive formulation.

7
Table 6 R code to compute cross-efficiency scores under the aggressive formulation
crosseff = matrix(0,nrow=N,ncol=N) # initialize cross efficiency matrix
for (i in 1:N) {
totaloutputs = as.numeric(colSums(outputs)) ; totaloutputs = totaloutputs-as.numeric(outputs[i,])
totalinputs = sum(inputs) ; totalinputs = totalinputs-as.numeric(inputs[i])
f.obj = c(totaloutputs,-totalinputs)
aux1= cbind(outputs,-1*inputs); aux11 = aux1[which(row(aux1)[,1]!=i),] ; aux1<-aux11[1:(N-1),]
aux1= rbind(aux1,c(0*rep(1,m),as.numeric(inputs[i])))
aux1= rbind(aux1,c(as.numeric(outputs[i,]),effcrs[i]*as.numeric(-inputs[i])))
f.con = aux1 ; f.rhs = rep(0,(N+1)) ; f.rhs[N] = 1 ; f.rhs[N+1] = 0 ; aux2 = rep("<=",N+1)
f.dir = aux2 ; f.dir[N] = "=" ; f.dir[N+1] = "="
weights0 = lp ("min", f.obj, f.con, f.dir, f.rhs,scale=0)$solution
for (j in 1:N) {
crosseff[i,j] = weights0[1:m]%*%t(outputs[j,])/( weights0[m+1]%*%(inputs[j]))
}
}
rankingb = (N*apply(crosseff,2,mean)-diag(crosseff))/(N-1)
maverick = (effcrs-rankingb)/rankingb
Table_agressive = t(rbind(as.numeric(effcrs),rankingb,t(maverick)))
colnames(Table_agressive) = c('CCR','cross_eff_agressive','Maverick')
rownames(Table_agressive) = rownames(data_dea)

Changing the minimization to the maximization in the code above we can compute cross-
efficiency score under benevolent formulation as indicated by the R code in Table 7.

Table 7 R code to compute cross-efficiency scores under the benevolent formulation


crosseff = matrix(0,nrow=N,ncol=N)
for (i in 1:N) {
totaloutputs = as.numeric(colSums(outputs)) ; totaloutputs = totaloutputs-as.numeric(outputs[i,])
totalinputs = as.numeric(sum(inputs)) ; totalinputs = totalinputs-as.numeric(inputs[i])
f.obj = c(totaloutputs,-totalinputs)
aux1 = cbind(outputs,-1*inputs) ; aux11 = aux1[which(row(aux1)!=i),] ; aux1 = aux11[1:(N-1),]
aux1 = rbind(aux1,c(0*rep(1,m),as.numeric(inputs[i])))
aux1 = rbind(aux1,c(as.numeric(outputs[i,]),effcrs[i]*as.numeric(-inputs[i])))
f.con = aux1 ; f.rhs<-0*rep(1,(N+1)) ; f.rhs[N]<-1 ; f.rhs[N+1]<-0
aux2 = rep("<=",N+1)
f.dir = aux2 ; f.dir[N] = "=" ; f.dir[N+1] = "="
weights0 = lp ("max", f.obj, f.con, f.dir, f.rhs,scale=0)$solution
for (j in 1:N) {
crosseff[i,j] = weights0[1:m]%*%t(outputs[j,])/( weights0[m+1]%*%t(inputs[j]))
}
}
rankingb = (N*apply(crosseff,2,mean)-diag(crosseff))/(N-1)
maverick = (effcrs-rankingb)/rankingb
Table_benevolent = t(rbind(as.numeric(effcrs),rankingb,t(maverick)))
colnames(Table_benevolent) = c('CCR','cross_eff_benevolent','maverick')
rownames(Table_benevolent) = rownames(data_dea)

Figure 2 shows the self-efficiency scores and the cross efficiency-scores based on aggressive
and benevolent formulations. The cross-efficiency scores is lower than the self-efficiency
scores in both formulations. The aggressive formulation points out ten mavericks (Maverick

8
score above than 0,2) and two false positive (Celtins, Piratininga). In the benevolent
formulation there are six mavericks and only one false positive (Celtins).

Aggressive formulation Benevolent formulation

Figure 2. Cross efficiency

5 Conclusions
This paper presented examples of how to implement DEA models in the R programming
language. We assessed the cross-efficiency of Brazilian electricity distribution utilities under
aggressive and benevolent formulations. The cross-efficiency approach has proven to be an
alternative to an arbitrary imposition of weights constraints.

The cross evaluation provide efficiency scores that accounts the peer evaluations. In addition,
it provides results that allow evaluate the robustness of the results from classic DEA models
without explicit assumptions about the relative importance of the variables.

References
APPA, G.; BANA e COSTA. C.A.; CHAGAS, M.P.; FERREIRA, F.C.; SOARES, J.O. DEA
in X-Factor evaluation for the Brazilian Electricity Distribution Industry, Working paper
LSEOR, 10-121, London School of Economics and Political Science, 2010.
BOGETOFT, P.; OTTO, L. Benchmarking with DEA, SFA and R. New York: Springer, 2011.
BUTTREY, S.E. Calling the lp_solve linear program software from R, S-Plus and Excel.
Journal of Statistical Software, v. 14, n. 4, mai, 2005. Disponível em: <
http://www.jstatsoft.org/article/view/v014i04 >. Acesso em: 8 fev. 2016.
CHARNES, A.; COOPER, W.W.; RHODES, E. Measuring the Efficiency of Decision
Making Units. European Journal of Operational Research, v. 2, p. 429-444, nov. 1978.
COOK, D.W.; ZHU, J. Modelling Performance Measurement: applications and
implementations issues in DEA, New York: Springer, 2005.

9
DOYLE, J. R.; GREEN, R. H. Efficiency and Cross-efficiency in DEA: Derivations,
Meanings and Uses. Journal of the Operational Research Society, v. 45, n. 5, p. 567-578, mai.
1994.
ESTELLITA LINS, M. P.; ANGULO-MEZA, L. Review of methods for increasing
discrimination in Data Envelopment Analysis. Annals of Operations Research, v. 116, n. 1, p.
225-242, out. 2002.
JASMAB, T.; POLLITT, M. Benchmarking and regulation: International electricity
experience. Utilities Policy, v. 9, n. 3, p. 107-130, set. 2000.
JOHNSON, R. A.; WICHERN, D. W. Applied Multivariate Statistical Analysis. 4th ed. New
Jersey: Prentice-Hall, 1998.
LIANG, L.; WU, J.; COOK, W. D.; ZHU, J. Alternative secondary goals in DEA cross-
efficiency evaluation. International Journal of Production Economics, v. 113, n. 2, p. 1025-
1030, jun. 2008.
MARINHO, A.; RESENDE, M.; FAÇANHA, L.O. Brazilian Federal Universities: relative
efficiency and data envelopment analysis. Revista Brasileira de Economia, v. 51, n. 4, p. 489-
508, out./dez.1997.
MARINHO, A.; CASTANHEIRA JÚNIOR, F.G. Assessing the efficiency and the
effectiveness of public expenditures on security in Brazilian states. In: Data Envelopment
Analysis and Performance Measurement: Proceedings of the 11th International Conference of
DEA. Eds: Banker, R. Emrouznejad, A., Bal, H. & Mehmte, A. C. Sansun, Turkey, 45-50,
2013.
PESSANHA, J.F.M.; FIGUEIRA de MELLO, M.A.R.; BARROS, M.; SOUZA, R.C.
Avaliação dos custos operacionais eficientes das empresas de transmissão do setor elétrico
Brasileiro: uma proposta de adaptação do modelo dea adotado pela ANEEL. Pesquisa
Operacional, Rio de Janeiro, v. 30, n. 3, p. 521-545, dez. 2010.
PESSANHA, J.F.M.; MARINHO, A.; LAURENCEL, L.C.; SOUZA, M.V.P. Teaching data
envelopment analysis on undergraduate statistics courses. In: IASE 2015 SATELLITE
CONFERENCE, Rio de Janeiro, 2015.
R DEVELOPMENT CORE TEAM R A language and environment for statistical computing.
R Foundation for Statistical Computing, Viena: Austria, 2014. Disponível em <http://www.R-
project.org/>. Acesso em: 8 fev. 2016.
REZENDE, S.M.; PESSANHA, J.F.M.; AMARAL, R.M. Avaliação cruzada das
distribuidoras de energia elétrica, Production, São Paulo, v. 24, n. 4, p. 820-832, out./dez.
2014
RUIZ, J. L.; SIRVENT, I. On the DEA total weight flexibility and the aggregation in cross-
efficiency evaluations. European Journal of Operational Research, v. 223, n. 3, p. 732-738,
dez. 2012.
SEXTON, T. R.; SILKMAN, R. H.; HOGAN, A. J. Data Envelopment Analysis: Critique and
extensions. In: Measuring Efficiency: an assessment of Data Envelopment Analysis. Ed:
Silkman, R. H. San Francisco: Jossey-Bass, 1986.
SOARES de MELLO, J.C.C.B.; ANGULO-MEZA, L.; SILVEIRA, J.Q.; GOMES, E.G.
About negative efficiencies in cross evaluation BCC input oriented models. European Journal
of Operational Research, v. 229, n. 3, p. 732-737, sep. 2013.

10
WANG, Y.M.; CHIN, K.S. Some alternative models for DEA cross-efficiency evaluation.
International Journal of Production Economics, v. 125, n. 1, p. 332-338, nov. 2010.

11

Вам также может понравиться