Вы находитесь на странице: 1из 153

Essays on Network Economics and

Finance
Lorenzo Ductor Gmez

Essays on Network Economics and


Finance

Lorenzo Ductor Gomez


Supervisors: Prof. Maria Dolores Collado Vindel, Prof. Marco Jury van der Leij

A Dissertation Submitted to
the Departamento de Fundamentos del Analisis Economico
Universidad de Alicante

In Partial Fulfillment of the Requirements for the Degree of


Doctor of Philosophy
March 2012

A mis padres.

Acknowledgements
Cada palabra de esta Tesis Doctoral desde la primera hasta la u
ltima
se la debo a mis padres. Les estare eternamente agradecido por todo
el esfuerzo que han hecho para que este sue
no sea una realidad. Estoy
tremendamente orgulloso y feliz de la familia que tengo, muchas gracias Antonio y Ana Mara. Sin vosotros esto no habra sido posible.
Gracias a mis abuelos y a mi hermana por todo su apoyo desde que
comence mi carrera academica en Granada.
Many people have contributed to this Thesis. Above all, I am indebted
to my advisors Maria Dolores Collado Vindel and Marco van der Leij.
I am tremendously grateful to Maria Dolores, whose support, encouragement, patient and mentorship have been crucial for the development of this dissertation.
I owe my deepest gratitude to Marco. He introduced me to this passionate world, which is becoming more and more popular, The Economics of Networks. He also introduced me to his personal networks
and gave me the opportunity to visit him at the University of Cambridge, where we started a joint article with Sanjeev Goyal and Marcel
Fafchamps. I am extremely grateful to Marco, Sanjeev and Marcel
for sharing with me part of their work, the Econlit Database, and for
their support, comments and advices during my Ph.D. studies. From
my experience with them, I could claim that peer effects and network
effects are present in intellectual collaboration.
I am very grateful to Yann Bramoulle, Jordi Caballe, Pierre-Philippe
Combes, Juan Carlos Conesa, Habiba Djebbari, Daryna Grechyna,

Gergely Horvath, Silvia Martinez and Francesco Serti for their hepful
comments.
Part of this dissertation was written at the University of Cambridge

and the Groupement de Recherche en Economie


Quantitative dAixMarseille (GREQAM). I would like to thank the Faculty of Economics at the University of Cambridge and the GREQAM for their
hospitality and for welcoming me as a Ph.D. student. Especially
thanks to Sanjeev Goyal in Cambridge and Yann Bramoulle in GREQAM for their enthusiastic discussions. I learned a lot of things from
them.
I would like to show my gratitude to the postgraduate courses professors: Luisa Fuster, Gabriel Perez Quiros, Climent Quintana-Domeque,
Marco van der Leij and especially to Peter Kennedy for teaching us
how to teach and do Applied Econometrics. Rest in peace Peter, I
will always remembered you as an enthusiastic, passionate and one of
the best professor I have ever had. I thank the administrative staff
Marilo Rufete and Josefa Zaragoza for their help and support.
I would like to express my gratitude to my colleagues of the Quantitative Economics Doctorate. Particularly to Carlos Aller, Gustavo
Cabrera, Gergely Horvath, Danilo Leiva and Xavier del Pozo for their
support and friendship.
Finally, I will be eternally grateful to Daryna Grechyna for her support
and for staying with me in the worst moments of my Ph.D. studies.
This thesis would not have been possible without her encouragement.
I gratefully acknowledge financial support from the Spanish Ministery of Science and Innovation (Programa Formacion del Profesorado
Universitario).

Contents
Introducci
on
0.1

0.2

ix

Primera Parte . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.1.1

Coautora y Productividad Academica . . . . . . . . . . .

0.1.2

Redes Sociales y la Produccion Academica . . . . . . . . . xiii

Segunda Parte . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

0.2.1

xv

Exceso del desarrollo financiero y el crecimiento economico

1 Co-authorship and Individual Academic Productivity

1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Estimation framework . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.3.1

Definition of the variables . . . . . . . . . . . . . . . . . .

13

1.3.2

Descriptive analysis . . . . . . . . . . . . . . . . . . . . . .

17

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

1.4.1

Does co-authorship lead to higher academic productivity?

20

1.4.2

Co-authorship and productivity across individual types .

23

Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

1.5.1

Not appropriate proxy variables.

. . . . . . . . . . . . . .

25

1.5.2

Co-authorship tie duration. . . . . . . . . . . . . . . . . .

27

1.5.3

Research overlap and coauthors coauthor productivity. . .

29

1.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

1.7

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

1.4

1.5

CONTENTS

2 Social networks and research output

37

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.2

Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.2.1

Empirical strategy . . . . . . . . . . . . . . . . . . . . . .

43

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

2.3.1

Definition of variables . . . . . . . . . . . . . . . . . . . .

47

2.3.2

Descriptive statistics . . . . . . . . . . . . . . . . . . . . .

51

Empirical findings . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

2.4.1

Predicting future output . . . . . . . . . . . . . . . . . . .

54

2.4.2

Networks and career cycle . . . . . . . . . . . . . . . . . .

56

2.4.3

Network information across productivity categories . . . .

60

2.5

Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

2.6

Exploring the channels . . . . . . . . . . . . . . . . . . . . . . . .

64

2.6.1

Career cycle effects and network variables . . . . . . . . .

64

2.6.2

Predictive value of output on networks . . . . . . . . . . .

65

2.6.3

Duration of predictive value . . . . . . . . . . . . . . . . .

67

2.6.4

Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . .

70

2.3

2.4

2.7

3 Excess Financial Development and Economic Growth

81

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

3.2

Financial Development and Growth . . . . . . . . . . . . . . . . .

86

3.2.1

Methodology . . . . . . . . . . . . . . . . . . . . . . . . .

86

3.2.2

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

3.2.3

Estimation Results . . . . . . . . . . . . . . . . . . . . . .

89

Financial Development, Real Sector, and Growth . . . . . . . . .

91

3.3.1

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

3.3.2

Estimation Results . . . . . . . . . . . . . . . . . . . . . .

98

3.3

3.3.2.1

Impact of excess financial development on longrun economic growth: Cross-sectional analysis . .

3.3.2.2

98

Impact of excess financial development on shortrun economic growth: System GMM . . . . . . . 101

3.4

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

vi

CONTENTS

3.5

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.6

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

References

115

vii

Introducci
on
Esta Tesis Doctoral esta compuesta de dos partes bien diferenciadas. La primera
parte consta de dos captulos que contribuyen a la literatura emprica de las
redes sociales, una rama emergente en la economa moderna. Las interacciones
sociales - representadas en redes o grafos - estan presentes practicamente en toda
actividad economica. Consecuentemente, la evaluacion de determinadas polticas
economicas debera realizarse teniendo en cuenta el impacto de estas interacciones
tanto en las acciones de los individuos como en el resultado economico. El estudio
de las redes sociales en economa ha ido adquiriendo una gran importancia desde
el ensayo de Granovetter (1985). Desde entonces se ha aplicado la teora de
redes sociales para analizar numerosos temas economicos como, por ejemplo: el
desempleo y la desigualdad salarial (Calvo-Armengol y Jackson, 2004), la difusion
del conocimiento y la innovacion (Bala y Goyal, 1998) o la provision de bienes
p
ublicos locales (Bramoulle y Kranton, 2007), entre muchos otros. Vease Goyal
(2011) para un resumen de la literatura teorica y emprica de las redes sociales
en la economa y Jackson (2008) para una sntesis de los modelos y tecnicas
empleadas para analizar las redes sociales.
La primera parte de la presente Tesis se centra en las posibles externalidades
inherentes en las redes de coautores academicos. La comprension de estas potenciales externalidades presentes en la colaboracion cientfica es de vital importancia
para la evaluacion de las polticas economicas cuyo objetivo son promover la colaboracin intelectual. Dichas polticas se han implementado presuponiendo una
relacion positiva entre la colaboracion cientfica y la productividad.
El primer captulo contrasta rigurosamente el impacto de la coautora en la
productividad de los autores academicos, utilizando como medida de productividad la calidad de la revista donde se ha publicado el artculo, su longitud y el

ix


0. INTRODUCCION

n
umero de artculos publicados en un determinado periodo. La relacion causal
entre la coautora y la productividad academica es identificada explotando informacion de la red de coautores del autor en el pasado.
En el segundo captulo, en colaboracion con Marcel Fafchamps, Sanjeev Goyal
y Marco van der Leij, se eval
ua el poder informativo de la red de coautores de un
autor para predecir el rendimiento del individuo. Los resultados sugieren que los
reclutadores se beneficiaran de obtener informacion sobre la red de coautores,
siendo el factor mas informativo la productividad de los coautores de un autor.
La segunda parte de la Tesis se centra en el estudio de los potenciales factores
causantes de las crisis financieras. En particular, el tercer captulo coautorado
con Daryna Grechyna analiza el impacto del exceso del desarrollo financiero,
definido como el diferencial entre la tasa de crecimiento del sector financiero e
industrial, en el crecimiento economico. La existencia del exceso financiero es
justificada bajo la teora del rebasamiento de la informacion (informational
overshooting). Demostramos que para un crecimiento economico sostenible, el
crecimiento equilibrado en ambos sectores, financiero y productivo, es requerido.
Cuando el desarrollo financiero excede al desarrollo industrial en un 4.5% (medidos en terminos de tasas de crecimiento); los recursos invertidos en la produccion
sobrepasaran la capacidad productiva de la economa, dando lugar a una cri continuacion, expongo de manera mas detallada el contenido de
sis financieraA
cada uno de los captulos.

0.1
0.1.1

Primera Parte
Coautora y Productividad Acad
emica

En este captulo, se analiza el impacto de la colaboracion academica en la productividad del academico. Durante las u
ltimas decadas, la colaboracion cientfica
y las polticas gubernamentales con el objetivo de inducir la coautora de trabajos han aumentado simultaneamente. Algunos ejemplos de estas polticas son
el EU-funded research networks (Commission of European Communities) y
a nivel nacional el Programa Ingenio 2010 (Ministerio de Educacion y Ciencia,
2006). En ambos programas, el proyecto de investigacion para el cual se solicita

0.1 Primera Parte

financiacion se ha de realizar en colaboracion con otros investigadores. Otro tipo


de polticas induciendo la colaboracion son las polticas internas de los departamentos, que exigen un n
umero mnimo de publicaciones al profesorado, pero
no descuentan esas publicaciones por el n
umero de academicos trabajando en el
artculo. Existen diversos factores, tanto positivos como negativos, a traves de
los cuales la colaboracion academica podra afectar a la productividad. Entre
los positivos podramos destacar los siguientes: el trabajo en equipo combina las
ideas y el talento de diferentes individuos y a su vez permite una division del trabajo y una difusion del conocimiento que no podran darse sin la coautora. Sin
embargo, la colaboracion academica tambien podra afectar negativamente a la
productividad: colaborar conlleva unos costes de comunicacion, facilita el problema del polizon y ademas requiere del compromiso de cada uno de los miembros
del equipo, que deben de estar de acuerdo con las ideas, metodologas y el texto
del artculo. Por tanto, el efecto de la coautora en la productividad academica
depende de cuales de estos factores, positivos o negativos, predominan. La literatura emprica sobre la relacion entre la coautora y la productividad academica
ha aumentado durante los u
ltimos aos. Sin embargo, no hay un consenso sobre si
el efecto de la colaboracion en la productividad es positivo, negativo o inexistente.
Laban y Tollison (2000), Presser (1980) y Zuckerman y Merton (1973) muestran
que los artculos cientficos coautorados son mas probables de ser aceptados para
su publicacin que los trabajos no coautorados. Chung, Cox and Krim (2009)
concluye que los artculos con mas autores son citados mas frecuentemente. Por
otro lado, hay autores que no han encontrado ninguna relacion entre ambas variables, como Medoff (2003), Lee y Bozeman (2005) y Acedo, Barroso, Casanueva
y Galan (2006). Mas sorprendente es la relacion negativa encontrada por Hollis
(2001). Utilizando un panel de 5277 artculos concluye que cuanto mas coautora
un individuo, mas baja es la productividad academica atribuida a ese individuo.
Un factor clave en la formacion de la coautora, ademas de las propias caractersticas de los autores, es el conjunto de oportunidad del autor, es decir, la
calidad de los proyectos que concibe el autor y los proyectos ofrecidos por sus
coautores. Por ejemplo, un autor puede decidir colaborar con otro porque carece
de ideas de gran calidad. Ignorar estos efectos de seleccion dara lugar a resultados
espurios y por tanto a conclusiones erroneas. Con la excepcion de Lee y Bozeman

xi


0. INTRODUCCION

(2005) ning
un artculo ha intentado controlar por la potencial endogeneidad de
la coautora cientfica. El primer captulo de la presente tesis intenta cubrir este
vaco en la literatura utilizando para ello un panel de datos de economistas entre
1970 y 1999. Estos datos han sido obtenidos a partir del Econlit, una bibliografa
de revistas en economa compilada por los editores del Journal of Economic
variable de productividad esta basada en el trabajo de Fafchamps,
LiteratureLa
Goyal y van der Leij (2010) y combina cantidad (n
umero y extension de los
artculos publicados) con calidad (ndice de impacto de la revista). Como las caractersticas del individuo y su conjunto de oportunidades estan endogenamente
relacionados con la coautora y la productividad, la colaboracion academica de
un individuo es instrumentada por los intereses comunes en investigacion entre
el autor y sus potenciales coautores en el futuro. Donde los potenciales coautores son los coautores de los coautores acumulados en el pasado. Como destaca
Fafchamps, Goyal y van der Leij (2010), uno de los factores mas determinantes
para iniciar una colaboracion academica es que ambos autores tengan intereses
comunes de investigacion. Por otro lado, la colaboracion es improbable cuando
hay demasiado solapamiento entre las habilidades de ambos investigadores. De
ah, que incluyamos el cuadrado de esta variable para capturar potenciales no
linealidades entre la colaboracion y la variable de intereses comunes. Ambos
instrumentos captan la idea de homofilia, que consiste en la tendencia de los
individuos a interaccionar con otros de similares caractersticas.
Los resultados sugieren que tras controlar por la endogeneidad de la coautora
y la heterogeneidad individual no observada existe una relacion positiva entre la
colaboracion intelectual y la productividad academica. Sin embargo, este efecto
no es homogeneo y vara significativamente en funcion de las habilidades de los
autores. Los autores mas habiles obtienen un beneficio a traves de la coautora
tres veces superior a los individuos con menor capacidad. Los resultados empricos
indican que los academicos pueden mejorar su rendimiento a traves de la coautora
entre ellos. Para los gobiernos e instituciones, los resultados justifican la existencia
de polticas que estimulen la colaboracion intelectual.

xii

0.1 Primera Parte

0.1.2

Redes Sociales y la Producci


on Acad
emica

Este captulo analiza como el conocimiento sobre la red social de un investigador -plasmada en las coautoras con otros autores - nos ayuda a desarrollar
una prediccion mas exacta de su productividad futura. El buen reclutamiento
requiere de una prediccion adecuada del potencial de un candidato. Los clubes
deportivos, departamentos academicos e incluso las empresas privadas utilizan el
rendimiento del candidato en el pasado como gua para predecir el potencial de los
solicitantes. En este captulo nos centramos en los investigadores academicos. Las
interacciones sociales forman un aspecto importante de la actividad investigadora:
los autores academicos discuten y comentan el trabajo de cada uno, eval
uan el
trabajo de otros para su publicacion y colaboran entre ellos en proyectos de investigacion. La colaboracion cientfica conlleva el intercambio de opiniones e ideas
y facilita la generacion de nuevas ideas, que puedan resultar en una mayor productividad en el futuro. Por lo que esperamos que, ceteris paribus, individuos
mejores conectados u ocupando posiciones mas centrales en la red tendran una
mayor probabilidad de recibir ideas y conocimientos de otros, aumentando as su
productividad en el futuro. La centralidad y la proximidad surgen de las conexiones creadas por los propios individuos. As estas conexiones pueden reflejar
caractersticas individuales inobservables, como la habilidad, sociabilidad y ambicion. Por ejemplo, una conexion entre un economista joven desconocido y un
economista de gran prestigio puede revelar caractersticas positivas del estudiante
(ambicion, esfuerzo, etc.) que de otro modo seran inobservables. Estas consideraciones sugieren que la red de colaboraciones academicas esta relacionada con
la produccion de dos modos: uno, la red como conducto de ideas y, dos, la red
como se
nal de la calidad individual de los investigadores. Mientras que el primero
sugiere una relacion causal entre la red y la produccion academica, el segundo
sugiere que la red es meramente un reflejo de las caractersticas individuales.
Como es sabido en la literatura emprica de interacciones sociales (Manski, 1993;
Moffit, 2001), identificar efectos de redes en un sentido causal es un reto ante
la ausencia de experimentos aleatorios. Este captulo contribuye a la literatura
emprica del estudio de las interacciones sociales. Tradicionalmente, los economistas han estudiado como las interacciones sociales pueden afectar a la conducta

xiii


0. INTRODUCCION

de determinados grupos, prestando especial atencion a la dificultad de identificar


empricamente los efectos de pares o de red. Para un resumen sobre este trabajo,
vease Moffit (2001) y Glaeser y Scheinkman (2002). Recientemente, el interes
se centra en los mecanismos a traves de los cuales las redes sociales influyen en
la conducta y el rendimiento de los agentes economicos. Recientes artculos empircos sobre la estimacion de los efectos de red son Bramoulle, Djebbari y Fortin
(2009), Conley y Udry (2008), Calvo-Armengol, Patacchini y Zenou (2008). La
dificultad de identificar estos efectos de la red ante la ausencia de experimentos aleatorios radica en la endogeneidad de las conexiones de la red (dado que
no son aleatorias) y en los factores correlacionados con las caractersticas inobservables de un individuo y la probabilidad de establecer nuevas conexiones. En
este artculo tomamos un enfoque alternativo a la literatura: investigamos como
disponer de informacion reciente y pasada sobre la red de colaboraciones contribuye a la prediccion de la productividad del investigador en el futuro. Primero,
estudiamos si las variables de la red de coautores proporcionan informacion relevante sobre la productividad futura de un investigador, una vez conocemos las
publicaciones recientes y pasadas del individuo. Luego investigamos que variables son mas informativas y como vara su poder de prediccion a lo largo de la
carrera academica de un autor. Nuestra primera conclusion es que la incorporacion de informacion sobre la red de coautores lleva a una modesta mejora en
la precision de los pronosticos sobre la produccion del individuo, mas alla de lo
que se puede predecir a partir del conocimiento de la produccion individual en
el pasado. Ademas, variables de la red, tales como la productividad de los coautores, centralidad de cercana o el n
umero de coautores contienen informacion
valiosa sobre la productividad futura de un individuo. Tambien encontramos que
el efecto de se
nalizacion contenido en la red es cuantitativamente mas importante
que el del flujo de ideas o de informacion. Nuestra segunda conclusion es que
el poder de prediccion de la red de coautores es mas elevado para investigadores
jovenes y a su vez, decae sistematicamente a lo largo de la carrera academica
de un autor. Por el contrario, la productividad reciente y pasada mantiene su
poder de prediccion sobre la produccion academica del autor durante toda su
vida academica. Nuestra tercera conclusion establece que el valor predictivo de
la red de coautores no es monotono con respecto a la productividad pasada del

xiv

0.2 Segunda Parte

autor. Por un lado, las variables de la red no ayudan a predecir el rendimiento


de los individuos cuya productividad inicial se encuentra por debajo de la media.
Por otro lado, su poder de prediccion es alto para individuos que se encuentran
entre los mas productivos (top 1) y aquellos cuya productividad inicial es inferior
a la media.
Los resultados encontrados en la primera parte de la Tesis reflejan la importancia tanto de la coautora para aumentar la productividad academica, como la
de la red de coautores para predecir el potencial de un investigador.

0.2
0.2.1

Segunda Parte
Exceso del desarrollo financiero y el crecimiento econ
omico

La segunda parte de la presente tesis analiza el exceso de credito o de instrumentos


financieros como uno de los principales factores causante de determinadas crisis
economicas. Intermediarios financieros cuyo objetivo son aminorar los problemas
de informacion asimetrica o facilitar las transacciones ejercen un impacto positivo en el crecimiento economico (Levine, Loayza y Beck, 2000). Sin embargo, las
recientes crisis sugieren que el exceso del desarrollo financiero podra perjudicar
al crecimiento economico bajo ciertas circunstancias. El enfoque tradicional y
optimista se ha centrado en una relacion lineal entre ambas variables. Estos estudios concluyen que el desarrollo financiero facilita el crecimiento economico y la
convergencia de los pases. Algunos ejemplos son Levine, Loayza y Beck (2000),
Aghion, Howitt y Mayer-Foulkes (2005) y Michalopoulos, Laeven y Levine (2009).
Por otro lado, el enfoque actual, tras la crisis financiera de 2007, es mas pesimista
y se centra en las posibles fragilidades bancarias ocasionadas por el desarrollo de
instrumentos financieros. Por ejemplo, Arcand, Berkes y Panizza (2011), quienes
reval
uan el trabajo de Levine, Loayza y Beck (2000), consideran una relacion no
monotona entre el desarrollo financiero y el crecimiento economico. Bajo este
supuesto, encuentran un lmite a partir del cual el desarrollo financiero tiene un
impacto negativo en el crecimiento economico. Beck, Chen, Lin y Son (2012)
eval
uan la relacion existente entre la innovacion financiera en el sector bancario y
su fragilidad, el crecimiento y la volatilidad del sector real. Ellos muestran que la

xv


0. INTRODUCCION

innovacion financiera esta asociada con una mayor volatilidad en el crecimiento


de las industrias que tienen una mayor dependencia de los recursos financieros
externos. Asimismo, encuentran una asociacion positiva entre la innovacion financiera y el riesgo idiosincratico a las fragilidades bancarias, mayor volatilidad
en los beneficios del sector bancario y mayores perdidas bancarias durante la
reciente crisis de 2007.
Este artculo contribuye a la literatura del desarrollo financiero y el crecimiento, proponiendo el crecimiento del sector industrial (productos manufacturados y energa) y el crecimiento del sector financiero (servicios financieros),
como dos factores que simultaneamente determinan el crecimiento economico del
pas. Sugerimos que para lograr una senda de desarrollo economico sostenible,
es necesario el equilibrio del progreso tecnologico en ambos sectores, industrial y
financiero. El progreso tecnologico en el sector real expande las capacidades de
produccin de la economa, mientras que el sector financiero permite un uso eficiente de estas nuevas capacidades. Analizando el estado del progreso tecnologico
y del desarrollo financiero durante las u
ltimas cuatro decadas contrastamos si el
exceso del desarrollo financiero, definido como la diferencia entre el progreso financiero y tecnologico, propicia un detrimento en el crecimiento economico y
consecuentemente una crisis financiera. Para explorar el efecto del exceso del desarrollo financiero en el crecimiento economico a corto plazo empleamos datos de
panel para 33 pases de la OECD. Consideremos un modelo de crecimiento donde
la variable dependiente es la tasa de crecimiento del PIB per capita. Nuestra
principal variable de interes es el diferencial en las tasas de crecimiento del sector financiero e industrial y su cuadrado, ambas variables son representativas del
exceso del desarrollo financiero. Tambien incluimos en el modelo el crecimiento
del sector industrial como variable de control, ya que estamos interesados en
variaciones del diferencial de ambas tasas no determinadas por el crecimiento del
sector industrial. Como puntualizo Kaldor (1967), la tasa de crecimiento del sector industrial es el factor mas importante para predecir el crecimiento economico.
Sin embargo, hay una variacion del crecimiento economico que permanece inexplicada. Nosotros nos centramos en explicar parte de esa variacion a traves del
sector financiero. El resto de variables de control en el modelo de crecimiento
son: el nivel del PIB per capita al comienzo de cada periodo de cinco aos, una

xvi

0.2 Segunda Parte

medida de la apertura comercial del pas, la inflacion, el gasto del gobierno en


proporcion al PIB y una variable representativa del capital humano. Dado que
todos estos factores son endogenos en el modelo y que por construccion el retardo
del PIB esta correlacionado con los errores del modelo, utilizamos el estimador
del metodo de los momentos generalizados (MMG) desarrollado para datos de
panel dinamicos por Arellano y Bover (2005) y aumentado por Blundell y Bond
(1998). Para evitar una infraestimacion del error estandar, implementamos la
correccion propuesta por Windmeijer (2005). Ademas, utilizamos como instrumento exclusivamente el segundo retardo de cada variable endogena para evitar
el sobre-ajuste de las mismas. Este estimador nos permite identificar el efecto
del exceso del desarrollo financiero en el crecimiento economico asumiendo que
las perturbaciones al crecimiento economico en el futuro no estan correlacionadas
con el desarrollo financiero en la actualidad. Este supuesto es factible dado que
utilizamos una ventana de cinco aos como periodo. Ademas, los tests de Hansen
y de Sargan apoyan la validez de los instrumentos. Los resultados demuestran
que el desarrollo financiero tiene un impacto positivo en el crecimiento en el corto
plazo. Sin embargo, cuando el diferencial entre el crecimiento del sector financiero
y el industrial alcanza el 4.45%, el efecto del desarrollo financiero en el crecimiento
economico pasa a ser negativo. Analogamente, utilizamos el diferencial entre el
credito privado en proporcion al PIB y la produccion industrial en proporcion
al PIB y su cuadrado como variables representativa del exceso del desarrollo financiero. Bajo esta nueva especificacion, obtenemos que el desarrollo financiero
tiene un impacto negativo en el crecimiento economico, cuando el credito privado
en proporcion al PIB excede al ratio entre el PIB industrial y el PIB total, en un
43.3%. Estos resultados son consistentes con nuestra hipotesis del exceso del desarrollo financiero. La justificacion teorica de estos resultados se basa en la teora
del rebasamiento de la informacion ( informational overshooting ) introducida
por Rob (1991) y Zeira (1994) y aplicada posteriormente por Zeira (1999) para
explicar el aplastamiento del credito. De acuerdo a esta teora, la economa tiene
una capacidad productiva limitada y desconocido. Este lmite en la capacidad
productiva puede estar justificado por una tecnologa limitada, demanda limitada
o por la escasez de recursos. Los agentes racionales emplean toda la informacion
disponible para formar expectativas sobre este lmite. Mientras el lmite no ha

xvii


0. INTRODUCCION

sido alcanzado las expectativas son cada vez mas optimistas. Finalmente, las
expectativas son tan optimistas que la economa sobrepasa su capacidad: los
recursos invertidos en produccion son demasiados en comparacion a las posibilidades de produccion. Las expectativas, inversion y la actividad economica sufren
una cada, dando lugar a una grave situacion economica. Nosotros proponemos
considerar el progreso tecnologico del pas como principal fuente del crecimiento
de la capacidad de la economa. De hecho, la introduccion de nuevas tecnologas,
la invencion de nuevos bienes y materiales sirven como sustitutivo de los factores
escasos de produccion como por ejemplo, los recursos naturales y la fuerza laboral. Por otro lado, se propone considerar el desarrollo financiero como factor
impulsor de la actividad economica. Conforme las posibilidades tecnologicas de
la economa aumentan, la demanda de servicios financieros tambien incrementan
(vease, por ejemplo, Aghion, Howitt y Mayer-Foulkes (2005) para una explicacion
basada en el modelo de crecimiento economico de Shumpeter). Por lo tanto, el
desarrollo financiero es un determinante crucial del crecimiento. No obstante,
cuando las nuevas tecnologas o instrumentos financieros son introducidas a una
tasa mas alta que las nuevas tecnologas de produccin, la velocidad a la cual la
economa se aproxima a su capacidad aumenta. Consecuentemente, el exceso
del desarrollo financiero podra disminuir el crecimiento economico debido a una
mayor probabilidad del rebasamiento de la capacidad productiva. Este modelo
no introduce ninguna friccion en el mercado, excepto la escasez de informacion.
La teora del rebasamiento de la informacion ha sido implementada para explicar
las crisis de la economa de mercado por Barbarino y Jovanovic (2007) y Bruno,
Rochet y Woolley (2009).

xviii

FIRST PART

Chapter 1
Co-authorship and Individual
Academic Productivity
1.1

Introduction

In recent decades, governmental policies aimed at inducing collaboration have


increased. These policies are based on the assumption that intellectual collaboration results in productivity gains for the researchers. Some examples of these
policies are the EU-funded research networks (Commission of European Communities, 2006) and the national Spanish Ingenio 2010 Program (Ministry of Education and Science, 2006). In both programs, researchers are required to collaborate
as a condition to obtain research funding.1 Other examples of policies inducing
academics to collaborate are internal departmental policies (such as evaluations
or rankings and employment or tenure decisions that require a minimum amount
of publications) that not fully discounted articles by the number of authors.
In addition, scientific collaboration between authors has substantially increased over recent decades. Indeed, the data shows that the number of papers
written by more than one author stood at 42% during the 1970s, while the proportion of co-authored articles was close to 60% in 1990s. Several authors have
1

The purpose of the EU-funded network is to facilitate the sharing of research resources
and, as a formally organized and coordinated structure, to sustain knowledge sharing among
partners. The aim of the funding is to enhance the research potential of participants through
the benefits of collaboration (Defazio et al., 2009).

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

provided explanations for this increase, including greater gains from the specialization and division of labor, falling communication costs and a greater pressure
to publish among others.2
Consequently, scientific collaboration is conditioned by a scientific policy that
has progressively stimulated intellectual collaboration (Melin & Persson, 1996).
If intellectual collaboration does not lead to gains in terms of research output, a
policy change is required.
A need has therefore arisen to study the effect of collaboration on academic
productivity and to answer these important questions: Does co-authorship lead to
higher academic productivity? Is the effect of co-authorship the same for every
individual? What are the channels through which collaboration might affect
individual productivity? This chapter contributes to these important questions.
By using data on economists over a 30 year period, from 1970 to 1999, I find that
after taking into account the endogeneity inherent in the co-authorship formation
process through an instrumental variable strategy, co-authorship leads to higher
individual academic productivity. However, this effect varies significantly between
low and high productive individuals.
On the one hand, co-authorship might positively affect individual productivity through several channels: teamwork may be more productive than working
alone as larger teams combine the ideas and talents of individuals (Chung et al.,
2009). Another advantage is the reduction of time devoted to a project, as teams
enable the researcher to start more projects with other authors at the same time.
Hudson (1996) proposes the division of labor as one of the main advantages of
coauthoring in economics. Apart from these advantages, coauthoring should lead
to higher productivity if knowledge spillovers i.e. network and/or peer effects
are present.3
2

For instance, Laban and Tollison (2000) compare the incidence and extent of formal coauthorship observed in economics against that observed in biology, and discuss the causes and
consequences of formal co-authorship in both disciplines. Hudson (1996) analyzes the increase in
co-authorship and the reasons for it. Goyal et al. (2006) studies the emergence of an economics
small world and the increasing co-authorship as a factor explaining this phenomenon.
3
Azoulay (2010) and Waldinger (2010) examine peer effects in science using the unanticipated
removal of individuals as a natural experiment.

1.1 Introduction

On the other hand, intellectual collaboration might also decrease individual


productivity. According to Hudson (1996), the main disadvantages of a collaboration are that it involves compromises and communication and organization
costs. When working in a group, individual authors will have to agree on ideas,
texts, approaches or even conclusions proposed by others. These compromises
may lead to a reduction in risk taking that might affect the final quality of the
article. Hudson (1996) likewise suggests that there is a free-rider problem, which
in this context is related to the idea that the higher the number of authors writing
a paper, the easier it is for an author to contribute less to the project. Moreover,
another possible disadvantage is the negative externality through time devoted
by collaborators to other projects with other authors (Jackson & Wolinsky, 1996).
This is what I call congestion externality. The main idea is that if our coauthors
are busy because they are working on many projects at the same time, they have
less time to devote to our project. Therefore, we will have to devote more time
to the collaboration, hence we have less time to do other work.
The literature examining the relationship between co-authorship and academic productivity has increased over recent years. However, there is no agreement as to whether this relationship is positive, negative or non-existent. Laban
and Tollison (2000) provide evidence that co-authored scientific papers are more
likely to be accepted for publication than sole-authored papers.4 Recently, Chung
et al. (2009) test the relation between intellectual collaboration and the quality
of the intellectual output using academic papers published in prestigious finance
journals. They find that papers with more authors are cited more often, in particular, papers with four authors are cited the most often.
In contrast to the above studies that suggest coauthoring and productivity
are positively related, Medoff (2003) show that collaboration does not result in
significant higher quality research in economics.5 In addition, using individual
panel data on 5277 journal publications, Hollis (2001) finds that output is negatively related to co-authorship: the more an individual coauthors, the lower is the
4

Presser (1980) and Zuckerman and Merton (1973) also find a positive correlation between
co-authored articles and the probability of acceptance.
5
More recently, Acedo et al. (2006) find very weak evidence that co-authored management
papers are of higher quality than sole-author papers.

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

research output attributable to that individual. Greater collaboration appears to


lead to more frequent, longer, and better publications, but when the publications are discounted by the number of authors, the relationship between research
output and teamwork becomes negative. Estimates of the size of this effect are
that, on average, adding one more author is associated with a per capita output
reduction of between 7% and 20%. Hollis (2001) provides several reasons for the
negative relationship: Mismeasurement (i.e. wrong measure of productivity), free
rider problem, coordination costs, future but not current benefits from collaboration, etc. Hollis also points out the potential endogeneity of team formation, that
is, the amount of co-authorship depends on the opportunity set the individual
faces. He defines the opportunity set as the projects conceived by the author,
and of projects offered to the individual as a coauthor. This study focuses on
the endogeneity of the co-authorship formation as the main econometric problem
driving this negative relationship.
Lee and Bozeman (2005) were the first to control for the possibility that coauthorship is formed endogenously, that is, authors choose with whom to work.
For example, an author may choose to collaborate because some ideas are hard
to be tackled individually or because he or she prefers to work with authors that
have similar characteristics or intellectual skills. In particular, a high assortativity in the matching process is observed in the scientific network, which suggest
that less able authors mainly collaborate with authors of a similar type (less
productive). Ignoring this selection, its effect would be incorrectly attributed
to collaboration and biased coefficients obtained. Lee and Bozeman (2005) deal
with the endogeneity problem by using the extent to which a scientists ties are
cosmopolitan i.e. outside the proximate work environment as an instrument
for the number of collaborators. However, there is a potential correlation between
the instruments and productivity, as outside ties can directly affect productivity
by providing access to new ideas (Singh, 2007). Moreover, they assume that the
productivity of an author is only a function of the number of articles published in
a period. I consider that the productivity of an author not only depends on the
number of article published by the individual but also on the quality and length
of each article.

1.1 Introduction

In addition, the previous literature does not control for unobservable heterogeneity with the exception of Hollis (2001) who estimates a fixed effect model.
This study is the first to simultaneously control for time invariant unobservable
factors and for the potential endogeneity inherent in the co-authorship formation.
The main finding is that, once I control for the endogeneity of network formation,
co-authorship leads to higher academic productivity. This result is robust and
statistically significant. I also find evidence about the potential presence of peer
effects and congestion externalities in the scientific networks.
I attempt to overcome some of the endogeneity difficulties by drawing on
methods used for social network analysis. In order to control for endogenous
collaboration formation an instrumental variables regression is estimated. The
instruments are derived from the past network of the author and are based on
the common research interests between the author and her potential coauthors
(the past coauthors of is coauthors) and its quadratic term. As Fafchamps et al.
(2010) point out, one of the most important factor in determining the likelihood
of collaboration is some commonality of research interest between the authors.
On the other hand, collaboration is unlikely when there is too much overlap in
skills. Hence, the quadratic term of the common research interests is included to
allow for potential non-linear effects between collaboration and research overlap.
Both instruments capture the idea of homophily, which consists in the tendency
of individuals to associate with those who have similar characteristics. Fixed
effects are also considered to capture individual unobserved heterogeneity.
A panel data among economists over a 30 years period, from 1970 to 1999, is
used to implement this empirical strategy. This panel includes economists who
published in journals which are included in the list of EconLit, a bibliography of
economic journals compiled by the editors of the Journal of Economic Literature.
I use the proportion of co-authored articles to measure the extent of collaboration,
and a productivity measure created using the publication record of each author
in the Econlit database. This productivity measure is based on the work of
Fafchamps et al. (2010) and combine quantity (number and length of published
articles) with quality (journal rank).

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

This chapter also contributes to the empirical study of social networks, which
is a relatively new research area. In a recent work, Fafchamps et al. (2010)
examine the formation of coauthor relations among economists over a twenty
year period. Their principal finding is that a new collaboration emerges faster
among two researchers if they are closer in the existing scientific network. In
particular, being at a network distance of 2 instead of 3 raises the probability of
initiating a collaboration by 27 percent.6
More related to this chapter are the papers that attempt a joint estimation of
network formation and network effects. Various recent papers deal with this issue.
For example, Mihaly (2009) develops an empirical strategy based on a two-step
selection model a` la Heckman to control for endogenous friendship formation
with the aim of measuring the effect of peer interactions on student academic
achievement. Conti et al. (2009) estimate the effect of popularity (measured
as the number of friendship nominations received from schoolmates) on the labor market returns controlling for the selection of friendship. Their empirical
strategy is to simultaneously estimate the outcome of interest together with the
friendship formation process. Similarly, this chapter estimates simultaneously the
productivity of an author and the amount of collaboration.
This work is also related to the articles that estimate social effects using network data. Recently, Bramoulle, et al. (2009) provide the necessary and sufficient
conditions under which peer effects can be identified using network data. They
propose the use of peers peers (and peers peers peers) characteristics as instrumental variables for the peers behavior. In this chapter, I do not estimate
rigorously peer effects in academia though, I provide evidence of their existence. Instead, the main aim is to estimate the causal effect of co-authorship on
individual productivity. Using network data, we obtain exogenous variation of
co-authorship through variation on the research interests between the author and
the coauthors coauthors accumulated in the past.
6

Mayer and Puller (2008) also study the formation of links. They argue that individual
level heterogeneity reflected in differences in age, sex and race plays an important role in the
matching process of individuals.
7
Other recent papers on networks effect include Boucher et al. (2010), Calvo-Armengol et
al. (2009), Lin (2010), Liu et al. (2012).

1.2 Estimation framework

1.2

Estimation framework

This project seeks to estimate the causal effect of co-authorship on individuals


output. The primary equation of interest is the productivity or output function
which is given by,

i,t + q1i,t + q2i,t +Di,t +i +t +i,t , (1.1)


log(qi,t ) = Ci,t +ti,t +t2i,t +Hi,t + n
where qi,t is the productivity of an author i, which is obtained taking into account
the quantity and quality of the articles published by the author.8 As the distribution of output is skewed to the right, I apply the log transformation to minimize
the effect of highly productive individuals on estimates. The variable Ci,t is the
co-authorship variable obtained as the proportion of co-authored articles.9 The
time-varying factors are: years since first publication, ti,t , and its square, t2i,t , the
degree of research specialization of the author, Hi,t , average number of coauthors
papers, n
i,t , average coauthors productivity, q1i,t and average coauthors coauthor productivity, q2i,t . Although, the average coauthors productivity, average
coauthors coauthor productivity and the average number of coauthors papers
might be endogenous to productivity, we can observe in the Bootstrap Hausman
Test provided in table 1.2 that their inclusion does not affect the co-authorship
coefficient. Moreover, their inclusion provides evidence of important network externalities. In addition, as the panel data start for each individual with her first
publication in the sample and extends till the last observed publication of the
author or 1999, an author would have at least one article in her last year of
publication. To prevent any bias from this, all regressions also include a dummy
variable authors last year of publication, Di,t . An individual fixed effects, i , is
also included to account for all time-invariant unobserved factors, such as innate
ability, nationality, gender, school education, etc. Years dummies are included
for each year from 1981 to 1999, these year fixed effects, t , account for any pos8

The precise definition of all the variables used in the estimation is provided in section 1.3
For robustness, I also consider the average number of coauthors as the co-authorship variable. See section 1.5
9

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

sible time trends in collaboration and individual performance.10 i,t is the time
varying error term. As both the productivity and the co-authorship variables
are likely to be correlated over time, I cluster standard errors by authors. The
main parameter of interest is , which captures the effect of co-authorship on
productivity.
One problem when defining the productivity variable is the lag of economics
publication. As pointed out in Fafchamps et al. (2010), there are considerable
delays between the moment a paper is submitted to a journal and the moment
it is published. Moreover, the time required for publication in economic journals
varies greatly both between and within journals. These two facts could lead
to a concentration of publications in some periods, which do not correspond to
the exact periods where the author was working on the projects. In order to
reduce this potential problem, all the variables, except experience, aggregate the
information for the last five years, from t4 to t (e.g. the dependent variable is the
log-average productivity from t 4 to t). Although variables contain information
about the last 5 years publications, the frequency used for estimation is annual;
I consider a year as a period.
Productivity might be affected by other important factors such as changes
in the degree of specialization, the time devoted by coauthors to other projects,
the quality of coauthors or the quality of coauthors coauthors. Some of these
time-varying factors are constructed from the scientific network. This network is
defined using authors as nodes and co-authorship as networks links. Following
Singth (2007), I assume a collaboration tie to last for 5 years.11 Therefore, the coauthorship network is defined for each year from 1974 to 1999, and each network
contains links formed between t 4 to t. The proxy variables for these timevarying factors are the following:
- The years since first publication, ti,t , is included to control for the experience of the author. Experience in any field or job is one of the main factors
10

The first authors observation in the estimation sample is 1980. See section 1.3 for more
details.
11
In the robustness section, the sensibility of the results to this assumption is tested.

1.2 Estimation framework

influencing productivity. Moreover, more experienced authors are more likely to


initiate a project with someone else as they have more contacts and therefore
more collaboration opportunities.
- The degree of research specialization, Hi,t , control for the potential effects
of specialization on productivity. Specialization allows a scientist to become an
authority on a given subject (Hackett, 2005).12 On the other hand, studying
a wide range of topics may facilitate the generation of new ideas and enable a
researcher to tackle projects that require a broader view as the researcher has a
more diverse knowledge.13 Moreover, specialization might also affects the amount
of collaboration as overly specialized authors may not be able to tackle projects
that requires knowledge on different fields. Therefore, they may be more willing
to collaborate than authors who have a more diverse source of knowledge.
- The average number of articles published by the coauthors of an author
i, n
i,t , is a proxy for the time devoted by is coauthors to other projects with
other authors i.e. congestion externality. The time allocated by is coauthors
to other projects with other authors might reduce the opportunities of author
i to initiate new projects, as he or she will have to devote more time to the
collaboration. Thus, authors who work with busier coauthors are likely to have
lower productivity.
- The average coauthors productivity of author i, q1i,t , captures the coauthors
quality - in terms of productivity.
- The average coauthors coauthor productivity of author i, q2i,t controls for
the quality of these nodes at distance 2.
The main important econometric problem is the endogenous co-authorship
formation. Authors choose with whom to work and these associations may be
influenced by unobservable characteristics. For example, an author may choose
to collaborate because some ideas are hard to be tackled individually. Then, the
12

Leahey (2006) found that specializing improves productivity.


Belmaker et al. (2010) found that both over-specialization and over-generalization are
detrimental to academic success.
13

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

difficulty of the project can lead to greater cooperation and may also lead to
higher output. In this case ignoring the selection of co-authorship would lead to
spurious results. To correct for this type of bias, an instrumenting strategy is implemented. Equation (1) is estimated by an efficient two-step generalized method
of moments (GMM), instrumenting Ci,t by: the common research interests between author i and the nodes at distance 2 (the coauthors of is coauthors) that
author i accumulated from t 10 to t 6, w2i,t6 , and its quadratic term.14 Both
variables capture the degree of homophily in terms of research interests between
author i and the potential coauthors that author i may have in the future. These
instruments control for the endogeneity of co-authorship by capturing the extent
of matching on the overlap in research interests of author i with her potential
coauthors.
The identification of the co-authorship parameter, , comes from past variation in the research overlap between author i and her potential coauthors, w2i,t6 .
Therefore, to get rid of the individual fixed effects, i , equation (1) is transformed
using first differencing instead of within transformation. As applying the latter
would create a spurious correlation between the average of the instrumental variables and the productivity.15 Given the small variability of the instruments from
one period to another, I consider the instrumental variables in levels to avoid the
problems of weak instruments.16 The assumption imposed by the efficient two14

Following Baum et al. (2003): The efficient GMM estimator minimizes the GMM criterion function J=N*g*W*g, where N is the sample size, g are the orthogonality or moment
conditions and W is a weighting matrix. In two-step efficient GMM, the efficient or optimal
weighting matrix is the inverse of an estimate of the covariance matrix of orthogonality conditions. The efficiency gains of this estimator relative to the traditional IV/2SLS estimator
derive from the use of the optimal weighting matrix, the overidentifying restrictions of the
model, and the relaxation of the i.i.d. assumption. For an exactly-identified model, the efficient
GMM and traditional IV/2SLS estimators coincide, and under the assumptions of conditional
homoskedasticity and independence, the efficient GMM estimator is the traditional IV/2SLS
estimator .
15
As the average of the instrument include the period from t 4 to t, positive values of the
common field overlap in this period will be associated with positive co-authorship and positive
productivity, creating an spurious correlation between the average of the instrument and the
productivity variable.
16
This alternative was proposed by Arellano and Bond (1991), who developed a GMM estimator using lagged levels of the endogenous variables internal instrument as instrument for
the equation in first differences. Instead, I use the lagged level of the common research interest
between an author and her potential coauthors external instrument as instrument for the

10

1.2 Estimation framework

step GMM estimator is that the variation in the error term is uncorrelated with
past research overlap between author i and her potential coauthors. Formally,
the two-step efficient GMM estimator relies upon the following orthogonality
conditions:
E(i,t w2i,t6 ) = E(i,t1 w2i,t6 ),
E(i,t (w2i,t6 )2 ) = E(i,t1 (w2i,t6 )2 ).
These instruments have a theoretical justification. They capture the idea of
homophily, which consists in the tendency of individuals to associate with those
who have similar characteristics. Homophily has been documented across several
characteristics, such as age, race, gender, religion and occupations, e.g., Fong and
Isajiw(2000), Baerveldt et al. (2004), Moody (2001), McPherson et.al. (2001),
Conti et al. (2009) and Mihaly (2009). I focus on the homophily between researchers in term of common research interests, measured by the research overlap
and based on the idea that authors tend to collaborate with individuals who
have some commonality in research interest. The homophily effect between researchers in term of research overlap has been documented by Fafchamps et al.
(2010). They show that the emergence of a new collaboration tie is decisively
shaped by the commonality of research interest between the authors. They also
find that the relationship between the research overlap index and the probability
of collaboration follows an inverted-U curve. For example, the likelihood of forming a collaboration is highest when the research overlap index is 0.660. I include
the squared of the research overlap variable to capture this potential invertedU curve relationship. The main idea behind the inverted-U curve relationship
is that authors need some common research interest to initiate a collaboration.
Hence, research overlap cannot be too small. On the other hand, when research
overlap is too strong collaboration is unlikely as there is too much overlap in skills
(Fafchamps et al., 2010).
These instruments are valid as long as the above orthogonality conditions are
satisfied. These assumptions are plausible as these potential coauthors are not
difference in the amount of collaboration.

11

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

nodes at distance 2 from the current network (obtained from nodes accumulated
from t 4 to t) but from the past network (obtained from nodes accumulated
from t 10 to t 6). Moreover, I do not expect that the research overlap would
affect the future productivity of an individual through other channels, other than
co-authorship, as this variable is only related to the matching process.17

1.3

Data

The data used for this chapter are the same as the one used by Goyal et al.
(2006), Fafchamps et al. (2010), Van der Leij and Goyal (2011) and Ductor et
al. (2011). The data come from the EconLit database, a bibliography of journals
in economics compiled by the editors of the Journal of Economic Literature.18
From this database, I use information on all articles published between 1970 and
1999 by less than 4 authors.19 The panel data start for each individual with her
first publication in the sample and extends till the last observed publication of
the author or 1999.
As already pointed in Ductor et al. (2011) a significant fraction of economists
in the EconLit database publish very infrequently and may not publish a single
piece over a five year period. To rule out such authors, I restrict attention to
individuals who at every point t, have published at least one piece in the previous 5
years i.e, active authors. Similarly, the model is estimated in first differences and
the instruments are based on information from t 10 to t 6. As a consequence,
the first six observations of each author are lost. Moreover, as the co-authorship
network combines 5 years of publications, I loose the first 5 years of the sample as
starting values.20 The analysis therefore only considers articles published between
1980 and 1999. Finally, authors who has never collaborated during their academic
17

See section 1.5 for the robustness of the instrument to a potential internal validity threat.
See Goyal et al. (2006) for more details.
19
As Van der Leij and Goyal (2011) pointed out, in the EconLit database 77% of the articles
were written by 2 authors, 19% by 3 authors, and 4% by 4 or more authors. Moreover, Van der
Leij (2006, pp. 53-56) show that the co-authorship network statistics are practically unaffected
when (for a subset of the data) articles with 4 or more authors are included.
20
Authors whose first article is published before 1974 are not considered since they do not
have a defined network from the first year of their career.
18

12

1.3 Data

career are excluded from the sample, since their variables are constant over the
time frame and a first differences model is estimated. These authors represent a
13.25% of the active authors population.21

1.3.1

Definition of the variables

In this section, all the variables used to estimate the outcome equation (1) are
described.
Co-authorship, Ci,t . The amount of co-authorship by an author i during a
period t is measured as the ratio between the number of co-authored articles and
the total number of articles published by the individual during the period t 4
to t. Notice that collaboration that does not result in journal publications is not
observed. Moreover, neither informal collaboration as valuable comments from
colleagues in conferences and/or seminars nor the amount of effort devoted by
each author to the project is observed.22
Productivity, qi,t . The productivity of an author i at period t is measured as
follows:

qi,t =

S
X
pagesj qualityj
,
Number
of
authors
j
j=1

where S is the total number of articles published by author i from t 4 to t.


The variable quality is a measure of the quality of the journal proposed by
Fafchamps et al. (2010). They construct this measure based on the work of
Kodrzycki and Yu (2006) hereafter KY who construct an impact index for a
large number of economics journals. They complement KY work by predicting
the impact index of journals not included in the list compiled by KY. To do
21

On average the productivity of authors without a co-authored article during all their career
life is 1.26, whereas the average productivity for authors with at least one collaboration is 5.35.
These sole-authors are also younger on average. Given their characteristics, their exclusion
might lead to a downward bias in the value of co-authorship.
22
One important effect of co-authorship may be to raise likelihood of publication for a given
quality of the research rather than to improve the quality per se. In that case the marginal
published single authored paper may be a better paper than the marginal published co-authored
paper. This may lead to a downward bias in the value of co-authorship.

13

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

this, they regress the KY index on commonly available information such as the
number of published articles per year, the impact factor, the immediacy index,
the Tinbergen Institute Index, an economics dummy, interaction terms between
the economics dummy and the impact factor, and various citation measures.
They then use the predicted value obtained from this regression as impact index
for journals not included in the KY list.23 The actual KY impact index is used
whenever available.
The pages variable measures the length of the article and is given by the
number of pages of the article divided by the average number of pages of the
articles published in the same journal.24 Thus, I assume that longer than average
papers are more valuable pieces of research.25
Observe that research is discounted by the number of authors, n, to account
for the individuals contribution to the sum of output, giving

1
n

credit to any

single author.
Field concentration. To measure the degree of specialization of an author I
use the Herfindahl index. Formally, this index is defined as

Hit =

F
X

(xit,f )2 ,

f =1

where xit,f is the total fraction of articles published by author i in the field f from
t 4 to t, and F is the number of fields. To construct this variable, articles are
categorized into 121 subfields according to the first two digits of the JEL codes.
Articles with multiple JEL codes are divided and assigned proportionally to each
of the corresponding fields. This measure takes value from 1/F , reflecting the
23

Since most of the journals that KY omitted are not highly ranked, their predicted quality
index is quite small.
24
As pointed out by Sauer (1988) if journal editors act as value maximizers to allocate space,
longer articles are more valuable than those of lesser length (on average).
25
The number of pages for each article has been truncated from above at fifty pages. The
main idea is not to give so much extra value to literature review articles as in general a literature
review paper is much longer than the average article. For robustness, I also use as productivity
only the journal quality index divided by the number of authors working in the article. See
Section 1.5

14

1.3 Data

maximum degree of diversity, to 1 if the author write all her articles in the same
field. Higher values of this index indicates a higher degree of specialization of the
author.
Average number of articles of the coauthors. This variable is computed as
the average number of papers published by the coauthors of author i from t 4
to t, excluding the papers published with author i. Although, there is a strong
correlation between this variable and the average coauthor productivity, peer
effects or network effects are expected to affect mainly the quality of the article.
This may be because the effects come through the flow of information from one
node to another in the co-authorship network. Thus, while the average number
of papers of the coauthors of author i captures how busy the coauthors of an
author are, the average productivity of the coauthors (which take into account
the quality of the article) is more related to the presence of knowledge spillovers.
Average Coauthors productivity. To control for the quality of coauthors, the
average productivity of coauthors from t 4 to t is included in equation (1). To
compute this variable, the productivity of each coauthor is calculated, excluding
the papers published with author i.
Average Coauthors coauthor productivity. This variable is computed in the
same way as the average coauthors productivity but looking at the productivity of the coauthors of coauthors, and excluding the papers published with the
coauthors of author i.
Instrumental variables. The first instrument measures the common research
interests between author i and the nodes at distance 2 that this author has had
from t 10 to t 6, w2i,t6 , i.e. the research overlap between author i and her
potential coauthors.
To obtain the proxy for research overlap between author i and her potential
coauthors, I use a measure that is very close to the one proposed by Fafchamps
et al. (2010). They construct an index of research overlap between any two

15

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

researchers.26 I extend their definition to construct an index of research overlap


between an author and all her potential coauthors.
To do this, Fafchamps et al. (2010) categorize articles into 121 subfields
according to the first two digits of the JEL codes. Articles with multiple JEL
codes are divided and assigned proportionally to each of the corresponding fields.
Then, the cosine similarity measure is considered to be a measure of field overlap
between i and all her nodes at distance 2 accumulated from t 10 to t 6,
2
(i).27 This measure is computed as follows: Suppose that xit6,f is the fraction
Nt6

of articles written by i in field f in the period from t 10 to t 6 (such that


2 (i)
P i
Nt6
is the fraction of articles written by the potential
f xt6,f = 1) and xt6,f
coauthors of author i in field f from t 10 to t 6. Then, the research overlap
index is,
N2

w2i,t6 = r
P

(i)

t6
i
f xt6,f xt6,f
 N 2 (i) 2 .

P
2
t6
xit6,f
f xt6,f

This variable takes values from 0, if i and her potential coauthors did not
write any paper in the same field, to 1 if i and her potential coauthors wrote in
exactly the same fields and in exactly the same proportion.
The second instrument is the squared of this variable and it is introduced
to account for the potential inverted-U relationship between field overlap and
collaboration documented by Fafchamps et al. (2010).
The instruments are measured using publications from t 10 to t 6 to
avoid spurious correlation with the productivity variable and its lag, which are
measured from t 4 to t and from t 5 to t 1, respectively.
26

See Fafchamps et al. (2010) for more details.


The research overlap measure capture not just having worked in similar research areas
but also overlap in research topics. For instance, if a researcher has worked on, say, development economics and microeconomic theory (2 separate categories in JEL codes), she may be
more likely to work with another researcher who has also focused on development and micro
(Fafchamps et al, 2010).
27

16

1.3 Data

1.3.2

Descriptive analysis

Figure 1.1 plots the relationship between the total average output measure and
career time using all individuals in the data set.

2.4

2.5

2.6

2.7

2.8

2.9

Figure 1.1: Total Average Productivity across Career time. Full sample

10

12

14
Career time

95% CI
Total Average Productivity from t4 to t

16

18

20

Fitted values

The sample consists of articles published from 1974-1999. All authors are considered except
those whose first article was published before 1974.

As it can be observed, there is a rapid increase in total average output between


the first publication and the ninth year after the first publication and then a
steady decline starts and continues to the sixteenth year. The negative trend in
the output after 9 years is due to fewer articles and lower quality. The increase on
productivity after the fifteenth year of experience is a consequence of the definition
of the panel in which an author has at least one publication in the last year of
her career. Moreover, the proportion of authors who have more than 15 years of
experience is high 40% of the total authors population, thus, these authors will
have at least one publication during the 16 and 21 years of their career leading
to the positive trend observed in figure 1 at the end of their academic life.28
28

For robustness, the main regressions are also performed extending the panel until the four
subsequent years after the last publication. The main results remain under this panel duration
and are available upon request from the author. I also consider a different specification including
a cubic term of the experience, the results are quantitatively the same.

17

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

Figure 1.2 shows the relationship between the total average output and career
time using only the sample of active authors. Notice that active authors tend to
publish more regularly as there is a rapid increase in the total average productivity
until the thirteenth year after the first publication, and then it practically remains
constant. The steady decline in the total average productivity in this sub-sample
is not observed.

4.5

5.5

6.5

Figure 1.2: Total Average Productivity across Career time. Active Sample

10

12

14
Career time

95% CI
Total Average Productivity from t4 to t

16

18

20
Fitted values

The sample consists of articles published from 1974-1999. Only active authors are
considered.

The non-linear relationship between experience and productivity is evidence to


include the square of experience in the regressions. This quadratic term captures
the decreasing return to experience or academics life-cycle effects.
Figure 1.3 graphs co-authorship against experience for the sample of active
authors. Observe that as the authors become more experienced, the average number of coauthors per article increases. This informally suggests that individuals
with more contacts are more likely to collaborate. On average, economists in
the first years after obtaining their Ph.D. have not established many contacts,
thus the opportunities to publish a paper with other colleagues are scarce. Then,
as the author becomes more experienced and known by others, the likelihood of
collaborating increases.

18

1.3 Data

.65

.7

.75

.8

.85

Figure 1.3: Total Average number of coauthors on Career time. Active Sample

10

15

20

Career time
95% CI
Total Average Number of Coauthors from t4 to t

Fitted values

The sample consists of articles published from 1974-1999. Only activeauthors are
considered.

Table 1.1 provides a summary statistics of the different variables used in the
estimation. Column 1 provides summary statistics for authors with average lifetime co-authorship between 0 and 0.5 (low-average co-authorship). Column 2
provides statistics for authors with average lifetime co-authorship between 0.5
and 1 (high-average co-authorship).
Note that for authors with a low average lifetime co-authorship the mean
productivity is 4.35. While for those authors with a high lifetime average coauthorship, the mean productivity is 5.76. Moreover, these authors have higher
average coauthors productivity and higher overlap in research interests with
nodes at distance 2. Also note the high variability of the productivity and network
variables, whose standard deviation is generally much higher than the mean.

19

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

Table 1.1: Summary statistics of the data


Variables
Avg. Productivity from t-4 to t
Avg. Pages from t-4 to t
Avg. Quality from t-4 to t
Experience
Co-authorship
Avg. Coauthors Productivity
Avg. Coauthors Coauthor Prod.
Avg. Coauthors papers
Research overlap with Coauthors Coauthor
Degree of Specialization
Number of Observations
Number of Authors

Low-co-authorship

High-co-authorship

mean

mean

4.35
8.62
4.19
9.60
.23
11.16
9.11
1.81
.14
.36
39585
6726

st.d.
10.12
5.25
8.79
4.45
.27
35.87
27.25
3.39
.25
.24
39585
6726

5.76
9.25
5.18
9.85
.77
21.62
19.25
3.28
.37
.35
68501
10652

st.d.
11.16
5.32
9.24
4.53
.27
41.10
33.10
3.41
.33
.23
68501
10652

a.
These statistics correspond to the active author sample and publications from 1974 to 1999.
The first four years of each author are not considered as the productivity and co-authorship
variable are not defined.

1.4

Results

This section presents the results from estimations of the outcome equation (1).

1.4.1

Does co-authorship lead to higher academic productivity?

The main question of interest is whether co-authorship affects productivity once


it is discounted by the number of authors. In this section, I estimate equation
(1); I provide results of the estimated model without controlling for the selection
of co-authorship, then estimates controlling for the potential endogeneity of coauthorship formation are provided.
Column 1 of Table 1.2 shows results of the first-difference specification in
which the independent variables include co-authorship, years since first publication, a dummy variable authors last year of publication and year dummies.
Column 2 provides estimates from a first-difference regression of equation (1) controlling for unobservable individual heterogeneity and time varying factors but
not for the endogeneity of co-authorship. The implication of these regressions is

20

1.4 Results

that co-authorship lead to lower academic productivity. This result is consistent


with Hollis (2001) who finds a negative effect of co-authorship on productivity.
To correct for the possible bias of the co-authorship measure, the instrumenting strategy described in section 1.2 is implemented. Column 3 of Table
1.2 presents the results controlling for the co-authorship formation process but
not for the time varying factors. Column 4 shows the results from estimating
equation (1) controlling for the endogeneity of co-authorship and controlling for
the degree of specialization of the author, average number of coauthors paper,
quality of coauthors and quality of the coauthors of coauthors.29 Observe that
the coefficient of co-authorship becomes significantly positive after instrumenting, which shows that the individual productivity increases as authors substitute
sole-authorship by teamwork. In other words, for example, the total productivity
of two authors collaborating on two published papers is greater than the total
productivity expected by each of the two authors writing a sole-authored paper.
One possible explanation for the change of the sign of the co-authorship variable
after instrumenting could be that authors have some periods where they have
better ideas than in other periods. On the one hand, collaboration is more likely
in periods where the author has a lack of ideas, since he or she is more willing to
accept any co-authored project of any quality. On the other hand, in periods of
good ideas, the author will only share ideas that require skills of other researchers.
Then, good ideas not requiring specialization will be sole-authored, resulting in
a positive correlation between sole-authorship and productivity (Hollis, 2001).
Once the instrumental variable strategy is implemented, only exogenous variations of the co-authorship variable through variations on the common research
interest between author i and her potential coauthors are considered. As the distribution and fields of specialization of these potential coauthors affects the range
and diversity of dispersed knowledge that an author can access (Bonacich, 1987),
having more diverse contacts might help an individual to create new knowledge
combinations, driving the benefits from exogenous co-authorship.
We can observe some evidence of the presence of knowledge spillover: the
higher the productivity of coauthors, the higher the productivity of the author is.
29

First stage estimates are presented in Table 1.8.

21

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

(2) FD

(3) FD-IV

Table 1.2: The effect of co-authorship on academic productivity


(1) FD

(4) FD-IV

2.6021
(.6570)
.0473
(.0590)
.0016
(.0005)
1.4294
(.0466)
.0193
(.0041)
.0019
(.0003)
.0002
(.0004)
.1831
(.0172)

Co-authorship

.2692
(.0178)

Y ES
10835
70794

Avg. Coauthors Coauthor Prod.

Avg. Coauthors Productivity

Avg. Number of Coauthors Papers

Degree of Specialization

(Experience)

Experience

.5099
(.0281)
.0790
(.0385)
.0034
(.0003)
1.5103
(.0325)
.0162
(.0023)
.0009
(.0002)
.0001
(.0002)
.2010
(.0125)

Y ES
10835
70794

.2911
(.0126)

Y ES
10835
70794

2.6906
(.5270)
.0027
(.0607)
.0022
(.0005)

Authors Last Year of Publication

Y ES
10835
70794

.3991
(.0288)
.1395
(.0421)
.0042
(.0003)

Year Dummies
Number of Authors
Number of Observations

Cragg-Donald Wald F statistic

.
24.30
38.40
Hansen J-test (p-value)

3.642(.0563)
0.199(.6556)
Endogeneity Test (p-value)

5.465(.0000)
36.062(.0000)
Bootstrap Hausman Test (p-value)

0.15(.7016)
b.
The sample consists of active authors and publications from 1980-1999. Year fixed effects were included in the analysis, but are not
reported here to conserve space. The Bootstrap Hausman Test evaluates if the difference of the co-authorship parameter estimated by
model (3) and model (4) are significantly different from zero. Standard errors in parenthesis adjusted for clusters. Significant at 1%
level, Significant at 5% level.

22

1.4 Results

On the other hand, the quality of authors at a higher distance in the co-authorship
network does not affect individual performance. The negative sign of the average
number of coauthors papers reflects the congestion externality idea. For example,
the busier the coauthors of author i are, the less time they devote to research
projects with this author and the lower the output of author i is. The above
network variables might be endogenous to productivity. However, their inclusion
does not affect significantly the effect of co-authorship on academic productivity.
Using the Bootstrap Hausman test, we cannot reject the null hypothesis that both
estimators of the co-authorship parameter are equivalent.30 Career time has a
negative impact on productivity for authors with more than 6 years of experience,
consistent with the academics life-cycle effects.31 Specialization have a negative
effect on productivity consistent with the findings of Belmaker et al. (2010).32
Regarding the empirical validity of the instruments. I reject the null hypothesis of weak instrumental variables. Thus, the instrumental variables are sufficiently correlated with the troublesome variable, co-authorship. The Hansens-J
test is used for testing the null that the overidentifying moment conditions are
true. From Table 1.2, we cannot reject the null that the instruments are valid.
Moreover, it is clear from the endogeneity test that the variable co-authorship
cannot be treated as exogenous.

1.4.2

Co-authorship and productivity across individual types

I am also interested in the relationship between co-authorship and productivity


across different types of individuals. As already pointed out by Ductor et al.
(2011), it is expected that the benefits from a collaboration differs across individuals. Access to ideas is an opportunity and it takes ability and effort to publish a
high quality article. Thus, it is reasonable to suppose that the potential benefits
from a collaboration vary with the abilities and efforts of a person (Ductor et al.,
30

See Appendix A3 for a description of the test.


As a consequence of the empirical strategy, the first observation of an author corresponds
to the seventh year after the publication of her first article.
32
I also consider other specification including field fixed effects using the 121 JEL codes. The
results not presented for the sake of brevity are available upon request from the author. All
the results are qualitatively the same under this specification
31

23

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

2011). The main hypothesis is that more able researchers can exploit the benefit
from co-authorship to greater extent.
In order to analyze the potential difference of co-authorship between the different types of individuals, I divide the sample into two different groups depending
on the productivity of the first publication. Summary statistics reported in Table 1.9 suggest that the first publication of an author is a very good predictor
of the future potential performance of an author. Then, the estimation strategy
is performed in each sample. Table 1.3 presents the effect of co-authorship on
productivity across high and low productive individuals.33 Column 1 shows the
estimation results for authors whose first publication productivity is above the
median i.e. greater than the 50% distribution of the first publication productivity. Column 2 presents the results for those authors whose first publication
productivity is equal or below the median.
In Table 1.3, we can observe that the effect of co-authorship on economists
productivity varies significantly between the different types of individuals. As
expected, more able authors obtain more benefits from co-authorship. Authors
whose first publication productivity is equal or below the median do not obtain
benefit from co-authorship e.g. the coefficient is only statistically significant
at the 10.8% level of significance. From the endogeneity test, the null that coauthorship is exogenous is rejected for both individual types. In the summary
statistics across authors, see Table 1.9, we can see a clear assortativity in the
matching process of the authors. More able authors tend to have high-productive
coauthors while less able authors collaborate with low-productive researchers. It
is more likely that the benefit from collaboration arises when there is a matching
between two different types of authors, for example, mentoring collaboration, as
learning effects are entailed to this type of collaboration. An interesting extension
is to study the mentoring effects and the role of the mentors for initiating the
co-authorship network of the junior researcher.
33

Unfortunately, weakness of the instrument under small sample does not allow to divide the
sample into more different groups.

24

1.5 Robustness

Table 1.3: The effect of co-authorship on academic productivity across individual


types
(1) >50%
Co-authorship
Experience
2

(Experience)
Degree of Specialization
Avg. Number of Coauthors Papers
Avg. Coauthors Productivity
Avg. Coauthors Coauthor Prod.
Authors Last Year of Publication
Year Dummies
Number of Authors
Number of Observations
Cragg-Donald Wald F statistic
Hansen J-test (p-value)
Endogeneity Test (p-value)

(2) 50%

3.0472
(.9815)
.1409
(.0860)
.0006
(.0009)
1.4679
(.0742)
.0503
(.0179)
.0019
(.0004)
.0006
(.0005)
.1469
(.0262)

1.0904()
(.6778)
.1033
(.0005)
.0043
(.0008)
1.3670
(.0508)
.0105
(.0131)
.0024
(.0006)
.0013
(.0006)
.2188
(.0199)

Y ES
6072
42490

Y ES
4763
28304

12.47
.225(.6354)
23.900(.0000)

16.00
.408(.5229)
6.180(.0129)

c.
Column 1 shows the estimation results for authors whose first publication productivity is
above the median. Column 2 presents the results for those authors whose first publication
productivity is equal or below the median. Standard errors in parenthesis adjusted for
clusters. Significant at 1% level, Significant at 5%,() Significant at 10.8%.

1.5

Robustness

The main result is the positive relationship between scientific collaboration and
individual output after controlling for the endogeneity of the co-authorship. However, I need to be concerned with some potential problems.

1.5.1

Not appropriate proxy variables.

It is possible that I am using an inappropriate measure of productivity. For


example, it might be that the length of an article is not an important factor in
measuring the productivity of a journal article.

25

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

It is also possible that I am using an incorrect measure of co-authorship and


that the ratio of the number of co-authored articles with respect to the total
number of articles is not a good proxy for the amount of teamwork. This is as I am
assuming that the only factor determining the amount of intellectual collaboration
is the proportion of co-authored papers. However, another important factor on
measuring the amount of teamwork is the number of authors working on each
article.
In this subsection, I want to check if the results change with the specification of
the productivity and co-authorship variables. Firstly, I redefine the productivity
of an article as the journal quality index divided by the number of authors working
on the article, that is, the new productivity variable is defined as

qi,t =

S
X
j=1

Qualityj
,
Number of authorsj

where S is the number of articles published by author i from t 4 to t.


Results presented in Table 1.4 show that the positive relationship between
co-authorship and productivity is not caused by the introduction of length as a
factor in measuring productivity.34
Then, I redefine the co-authorship variable as the average number of authors
for all the articles published by an author from t 4 to t. Table 1.5 presents the
results using the new proxy variable for co-authorship. We can observe that the
positive relationship between intellectual collaboration and intellectual output is
not caused by the proxy variable used for co-authorship.
Therefore, the relation of intellectual collaboration and intellectual output is
positive and robust to the specification of the productivity and co-authorship
variables.

34

The results for the FD estimator are also very similar to the main regressions but are not
presented for the sake of brevity.

26

1.5 Robustness

Table 1.4: The effect of co-authorship on academic productivity using other proxy
for productivity
(1) FD-IV
Co-authorship
Experience
2

(Experience)
Degree of Specialization
Avg. Number of Coauthors Papers
Avg. Coauthors Productivity
Avg. Coauthors Coauthor Prod.
Authors Last Year of Publication
Year Dummies
Number of Authors
Number of Observations

2.8198
(.6792)
.0749
(.0613)
.0016
(.0005)
1.3459
(.0484)
.0444
(.0127)
.0018
(.0003)
.0005
(.0004)
.1683
(.0125)
Y ES
10835
70794

Cragg-Donald Wald F statistic


Hansen J-test (p-value)
Endogeneity Test (p-value)

24.295
0.746(.3877)
47.939(.0000)

d.
In this analysis the productivity of an article only depends on the quality of the article.
Standard errors in parenthesis adjusted for clusters. Significant at 1% level.

1.5.2

Co-authorship tie duration.

In the main analysis, I have considered that a collaboration tie lasts for 5 years
i.e. a co-authorship network at period t contains links formed between t4 to t. In
this subsection, I check if the results are sensible to this assumption by considering
that the effect of a collaboration tie persists for a shorter period, 3 years. Firstly, I
compute all the variables combining 3 years of publications e.g. now, the average
coauthors productivity is obtained using publications and coauthors accumulated
from t 2 to t, the dependent variable is the average productivity from t 2 to
t, etc. As in the main analysis, only authors who publish at least one piece of
research every five year are considered. Consequently, an author may not publish
a piece of research from t 2 to t. Therefore, the log transformation to the

27

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

Table 1.5: The effect of co-authorship on academic productivity using other proxy
for co-authorship
(1) FD-IV
Co-authorship
Experience
2

(Experience)
Degree of Specialization
Avg. Number of Coauthors Papers
Avg. Coauthors Productivity
Avg. Coauthors Coauthor Prod.
Authors Last Year of Publication
Year Dummies
Number of Authors
Number of Observations

2.6466
(.7918)
.0449
(.0652)
.0011
(.0007)
1.3799
(.0484)
.0455
(.0162)
.0023
(.0005)
.0009
(.0007)
.1503
(.0252)
Y ES
10835
70794

Cragg-Donald Wald F statistic


Hansen J-test (p-value)
Endogeneity Test (p-value)

11.517
0.063(.8017)
33.150(.0000)

e.
The proxy variable for co-authorship is the average number of authors per article from t 4
to t. Standard errors in parenthesis adjusted for clusters. Significant at 1% level.

dependent variable cannot be used, instead the log(qi,t + 1) transformation is


applied. Finally, the effect of co-authorship on output is estimated using the
same empirical strategy described in section 2.1 Table 1.6 suggests that the main
conclusions remain, however, the numerical magnitude of the coefficient of coauthorship, , is smaller than that of the analogous coefficient in Table 1.2. The
externalities accrued from the network might takes a long period to affect the
productivity of an author, this could explain the smaller effect of co-authorship
under this shorter tie duration.

28

1.5 Robustness

Table 1.6: The effect of co-authorship on academic productivity assuming a different co-authorship tie duration
(1) FD-IV (3-years)
Co-authorship
Experience
2

(Experience)
Degree of Specialization
Avg. Number of Coauthors Papers
Avg. Coauthors Productivity
Avg. Coauthors Coauthor Prod.
Authors Last Year of Publication
Year Dummies
Number of Authors
Number of Observations

1.0662
(.5325)
.0121
(.0450)
.0015
(.0004)
1.0188
(.0192)
.0258
(.0170)
.0018
(.0002)
.0006
(.0004)
.1130
(.0119)
Y ES
15233
91917

Cragg-Donald Wald F statistic


Hansen J-test (p-value)
Endogeneity Test (p-value)

13.776
.730(.3928)
8.312(.0039)

f.
In this analysis, I assume that collaboration tie lasts for 3 years. The sample of articles
analyzed here is from 1978-1999. As in the main analysis, I consider authors who publish at
least a piece of research every five year. Year fixed effects were included in the analysis, but
are not reported here to conserve space. Standard errors in parenthesis adjusted for clusters.

Significant at 1% level, Significant at 5%.

1.5.3

Research overlap and coauthors coauthor productivity.

The main identification strategy relies on the assumption that the past common
research overlap between an author and her potential coauthors does not affect
future changes in productivity through other channels rather than co-authorship.
However, it is possible that an author might change her degree of field specialization to meet productive potential coauthors and obtain benefits from them
that are not passed to productivity through co-authorship e.g. favoritismin
the review process. I do not think this is an internal threat to the validity of the

29

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

instruments as changes in fields of specialization require an important investment


of time and effort, which are probably not compensated by the potential benefitsthat an author might obtain by meeting productive potential coauthors
assuming that such favoritismexists. Nevertheless, the aim of this subsection is
to test the potential existence of this internal threat. In an attempt to evaluate
the validity of the instruments to this threat, I estimate how changes in the research overlap between an author and her coauthors are affected by the average
productivity of her potential coauthors. The main specification of interest is,

wi,t = ti,t + t2i,t + w2i,t + q2i,t +


q2i,t6 + Di,t + i + t + ui,t

(2)

where wi,t is the common research overlap between an author and her coauthors
from t 4 to t, w2i,t is the common research overlap between an author and all
her coauthors of coauthors from t 4 to t, ti,t is the years since first publication,
Di,t is a dummy authors last year publication and q2i,t6 is the main variable
of interest, the average productivity of the coauthors of coauthors from t 10
to t 6. An individual fixed effect is included to account for the individual
unobserved heterogeneity, i . Year fixed effects, t , capture any possible time
trends in research overlap and potential coauthors productivity. Table 1.7 shows
the effect of estimating equation (2) by first difference.
The results suggest that the average potential coauthors productivity does
not affect the common research overlap between an author and her coauthors at
period t. Therefore, authors do not change their fields of specialization according
to the productivity of their potential coauthors.35

35

A different specification using the degree of specialization of the authors as the dependent
variable instead of the common research overlap leads to the same conclusion.

30

1.6 Conclusions

Table 1.7: Common research overlap and potential coauthors productivity


(1) FD
Average Potential Coauthors Productivity
Current Average Coauthors Coauthor Productivity
Experience
2

(Experience)
Current Research Overlap with the Coauthors of Coauthors
Authors Last Year of Publication
Year Dummies
Number of Authors
Number of Observations

.00003
(.00002)
.00001
(.00004)
.00144
(.00127)
.00005
(.00005)
.61451
(.00635)
.01189
(.00124)
Y ES
14832
76469

g.
In this analysis, I examine how potential coauthors productivity, measured by the past
average coauthors coauthor productivity from t 10 to t 6, affects the common research
overlap between an author and her coauthors from t 4 to t. The sample of articles analyzed
here is from 1981-1999. I consider authors who publish at least a piece of research every five
year. Year fixed effects were included in the analysis, but are not reported here to conserve
space. Standard errors in parenthesis adjusted for clusters. Significant at 1% level,
Significant at 5%.

1.6

Conclusions

The aim of this chapter is to analyze the effect of intellectual collaboration on


individual academic productivity. The approach proposed allows to control for
unobservable heterogeneity, time varying factors and for the potential endogeneity
of teamwork formation. None of the previous studies have controlled for all these
potential source of endogeneity simultaneously. This analysis based on a panel of
129,003 economists over a 30 year period, from 1970 to 1999 reveals the following:
First, greater collaboration leads to higher academic productivity even after
discounting by the number of authors working on an article. The positive relationship between intellectual collaboration and intellectual output is in contrast
with Medoff (2003) and Hollis(2001), who find a negative relationship between
co-authorship and academic output.

31

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

Second, co-authorship selection is endogenous i.e. authors choose with whom


to work depending on the quality and difficulty of their projects, which shows
that previous results might be spurious. Specifically, the results turn from a
significant negative effect of co-authorship on individual academic productivity
in the baseline model to a significant positive effect in the specification after
controlling for the endogenous team formation.
Third, over-specialization is detrimental to an authors productivity. I also
find evidence about the presence of peer effects and congestion externalities in
academic research.
Finally, the effect of co-authorship on economists productivity varies significantly between the different types of individuals. More able authors obtain more
benefits from teamwork. For example, authors whose first publication productivity is below the median do not obtain statistically significant benefit from
co-authorship. This might be a consequence of the high assortativity in the
matching process, which suggests that less able authors mainly collaborate with
authors of similar type low productive.
The results are important for a number of reasons. They justify the existence
of governmental policies and funding to promote collaboration. However, these
policies should not be addressed to all individuals in the same manner, as the
benefit from collaboration are not exploded by low-productive individuals. Policies aimed to induce a mixed matching process e.g. mentoring collaboration
could probably facilitate the learning process of the low-productive authors and
increase their current and future research output. On the other hand, the result are important for economists, as collaboration between them might enhance
their performance, and therefore facilitate the access to research funding, higher
salaries and prestige.
Future studies could analyze the effect of the different types of collaboration
e.g. mentoring, external collaboration on academic productivity; as they might
have different policy implications. Moreover, understanding the channel through
which collaboration increases productivity is of great importance.

32

1.7 Appendix

1.7

Appendix

A1. First stage estimates


In Table 1.8, I provide estimates of the first stage regressions. Column 1 shows
the first stage results of the main regressions. Column 2 presents the results of
the first stage using the average number of coauthors per article as the proxy
for co-authorship. Column 3 shows the first stage results using 3 years window
variables.
The sings of the instruments suggest that a high common research interest
between author i and her potential coauthors - nodes at distance 2 in the past is needed for these authors to initiate a collaboration in the future. The opposite
signs of the research overlap variables is a consequence of the higher correlation of
these variables with the lag of co-authorship. The rest of signs are like expected.
Note that the average productivity of coauthors and the average productivity of
coauthors of coauthors have a negligible effect on co-authorship, -.0003 and .0005,
respectively. Thus, their inclusion does not affect significantly the magnitude of
the co-authorship effect on academic productivity.
A2. Summary statistics
Table 1.9 shows statistics - average and standard deviation - across the different types of individuals. Column 1 shows the mean and standard deviation of
the different variables for authors in the top 50% of the distribution of the first
publication productivity. Column 2 shows statistics for those authors who are
below the 50% of the distribution of the first publication productivity.
Note that the average coauthors productivity is very related to the productivity of the author, reflecting the assortativity inherent in the matching process.
A3. Bootstrap Hausman test
I want to test H0 : plim(
) = 0, where is the coefficient of co-authorship
estimated in the model including the network variables and the degree of specialization (results presented in Table 2, Column 4) and is the coefficient of
co-authorship estimated in the model ignoring these potential endogenous variables (results presented in Table 2, Column 3). Under the standard assumptions,
each estimator is asymptotically normal and so is their difference. Taking the
usual quadratic form,

33

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

n
o1
a

H = (
) V (
)
(
) 2 (1).
T

The standard Hausman test requires that one of the estimators must be fully
efficient under H0 . Clearly, this condition is not satisfied, since standard errors
are correlated within individuals and cluster standard errors are used in both
estimations. Therefore, bootstrap is implemented to estimate V (
), without
the need to assume that one of the estimators is fully efficient under H0 . The
V (
) is estimated using

V (
) =

1X
1X
1 X
(
t t
(
t t ))(
t t
(
t t ))T ,
T 1 t
T t
T t

where T=400 is the number of replications. For a full description of the test, see
Cameron and Trivedi (2005, chapter 11).

34

35
Y ES
10835
70794

.0255
(.0065)
.0138
(.0077)
.0405
(.0103)
.0005
(.0001)
.0261
(.0097)
.0180
(.0007)
.0003
(.0001)
.0005
(.0000)
.0052
(.0036)

(1) Prop. of Co-authored Papers

Y ES
10835
70794

.0328
(.0097)
.0242
(.0118)
.0390
(.0135)
.0007
(.0001)
.0435
(.0139)
.0198
(.0010)
.0005
(.0001)
.0008
(.0000)
.0175
(.0052)

(2) Avg. N. of Coauthors

Y ES
15233
91917

.02471
(.0083)
.0134
(.0102)
.0483
(.0132)
.0005
(.0001)
.0012
(.0069)
.0314
(.0009)
.0003
(.0001)
.0007
(.0000)
.0087
(.0041)

(3) Other Tie Duration

h.
Column 1 presents the first stage results using the proportion of co-authored paper variable as a measure of co-authorship. Column 2
shows the first stage results using the average number of coauthors per article as the co-authorship variable. In column 3 the first stage
results using a 3 year co-authorship tie duration is considered. Standard errors in parenthesis adjusted for clusters. Significant at 1%
level, Significant at 5%, Significant at 10%

Year Dummies
Number of authors
Number of observations

Authors Last Year Publication

Avg. Coauthors Coauthor Prod.

Avg. Coauthors Productivity

Avg. Number of Coauthors Papers

Degree of Specialization

(Experience)

Experience

(Res. Ov. with Potential Coauth.)

Res. Ov. with Potential Coauth.

Variables/ Models:

Table 1.8: First Stage Regressions

1.7 Appendix

1. CO-AUTHORSHIP AND INDIVIDUAL ACADEMIC


PRODUCTIVITY

Table 1.9: Summary statistics across individual types


Variables/ Percentiles

>50%
mean

Avg. Productivity from t-4 to t


Co-authorship
Experience
Avg. Coauthors Prod.
Avg. Coauthors Coauthor Prod.
Avg. Number of Coauthors Papers
Research overlap with the Coauthors of Coauthors
Degree of Specialization
Number of Observations
Number of Authors

7.76
.56
12.76
23.95
20.42
2.97
.30
.33
80106
8902

st.d.

50%
mean

st.d.

13.19
2.27
6.21
.53
.55
.39
6.12
9.51
4.29
45.59
8.73 24.51
35.89
8.67 22.31
3.52
2.27
3.28
.32
.24
.32
.22
.37
.24
80106 50170 50170
8902 7902 7902

i.
Column 1 shows the summary statistics for authors whose first publication productivity is
above the median. Column 2 presents the summary statistics for those authors whose first
publication productivity is equal or below the median. The sample of articles analyzed here is
from 1980-1999 and only consider active authors.

36

Chapter 2
Social networks and research
output
2.1

Introduction

Good recruitment requires an accurate prediction of a candidates potential future


performance. Sports clubs, academic departments, and business firms routinely
use past performance as a guide to predict the potential of applicants and to
forecast their future performance. In this chapter the focus is on researchers.
Social interaction is an important aspect of research activity: researchers discuss
and comment on each others work, they assess the work of others for publication
and for prizes, and they join forces to coauthor publications. Scientific collaboration involves the exchange of opinions and ideas and facilitates the generation of
new ideas. It follows that the characteristics of ones collaborators and the general structure of the collaboration network may reveal useful information about
future productivity. We also expect that access to new and original ideas helps researchers be more productive. So we would expect that, other things being equal,
highly connected individuals or individuals who are central in the network are
more likely to be productive in the future.
Centrality and proximity themselves arise out of links created by individuals
and so they reflect their individual characteristics e.g., ability, sociability, and
ambition. For instance, collaboration with highly productive coauthors may reveal that these coauthors find such collaboration worthwhile. Since the ability

37

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

of a researcher is imperfectly known, the existence of such ties may by itself be


informative.
The above considerations suggest that the collaboration network is related to
research output in two ways: one, the network serves as a conduit of ideas and,
two, the network signals individual quality of researchers. Whereas the former
suggest a causal relationship between network and research output, the latter
suggest that the network is merely a reflection of individual characteristics. As
it is known in the literature on social interactions (Manski, 1993; Moffit, 2001)
identifying network effects in a causal sense is very difficult in the absence of
randomized experiments.
In this chapter we take an alternative route: we focus on the predictive power
of social networks in terms of future research output. That is, we investigate
how much current and past information on collaboration networks contributes
to forecasting future research output. Causality in this sense of prediction
informativeness is known as Granger causality. Analysis of this type is quite
common in the macroeconometrics literature, see for example, Stock and Watson
(1999) who investigate the predictive power of unemployment rate and other
macroeconomics variables on forecasting inflation.1 Finding Granger causality of
network variables on future output cannot be interpreted as evidence of network
effects in the traditional causal sense. Nonetheless, it implies that knowledge on
a researchers network can potentially be used by an academic department in
making recruitment decisions.
We apply this methodology to evaluate the predictive power of collaboration
networks among economists on their future research output, in terms of future
economics publications. We first ask whether social network measures help predict future research output beyond the information contained in individual past
performance. We then investigate which specific network variables are informative and how their informativeness varies over a researchers career.
Our first set of findings are about the information value of networks. We find
that incorporating information about coauthor networks leads to an improvement
1

A few examples of applications that have determined the appropriateness of a model based
on its ability to predict are Swanson (1998), Sullivan et al. (1999), Lettau and Ludvigson
(2001), Rapach and Wohar (2002) and Hong and Lee (2003).

38

2.1 Introduction

in the accuracy of forecasts on individual output, over and above what we can
predict based on the knowledge of past individual output. The effect is significant
but modest, e.g., the root mean squared error in predicting future productivity
falls from 0.677 to 0.663, while the R2 increases from 0.417 to 0.442. We also observe that several network variables such as productivity of coauthors, closeness
centrality, and the number of coauthors have predictive power. Of those, the
productivity of coauthors is the most informative network statistic among those
we examine. Tables 2.2-2.4 and Figure 2.2 provide estimates on these effects.
Secondly, the predictive power of network information varies over a researchers
career: it is more powerful for young researchers but declines systematically with
career time. By contrast, information on recent past output remains a strong predictor of future output over an authors entire career. As a result, fifteen years
after the onset of a researchers publishing career, networks do not have any predictive value on future research output over and above what can be predicted
using recent and past output alone. Figures 2.3 and 2.4 illustrate these patterns
of our data.
Our third set of findings is about the relation between author ability and the
predictive value of networks. We partition individual authors in terms of past
productivity and examine the extent to which network variables predict their
future productivity. We find that the predictive value of network variables is nonmonotonic with respect to past productivity. Network variables do not predict
the future productivity of individuals with below average initial productivity.
They are somewhat informative for individuals in the highest past productivity
tier group. But they are most informative about individuals in between. In
fact, for these individuals, networks contain more information about their future
productivity than recent research output. The relation between network variables
and author productivity type is illustrated in Figures 2.5 and 2.6. Taken together,
these results predict that academics recruiters would benefit from gathering and
analyzing information about the coauthor network of young researchers, especially
for those who are relatively productive.
We believe that the analysis of the predictive power of collaboration networks
is useful in the absence of randomized experiments. However, naturally we also
would like to disentangle the two channels that may create this predictive power:

39

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

conduit of ideas versus network links as a signal of individuals quality. We therefore provide some tentative results on the relative importance of each channel. We
consistently find that the signaling channel is more important than the conduit
of ideas channel.
This chapter is a contribution to the empirical study of social interactions.
Traditionally, economists have studied the question of how social interactions affect behavior across well defined groups, paying special attention to the difficulty
of empirically identifying social interaction effects. For an overview of this work,
see for instance Moffitt (2001) and Glaeser and Scheinkman (2002). In recent
years, interest has shifted to the ways by which the architecture of social networks influences behavior and outcomes.2 Recent empirical papers on network
effects include Bramoulle, Djebbari and Fortin (2009), Conley and Udry (2008),
Calvo-Armengol, Patacchini and Zenou (2008), and Fafchamps, Goyal and van
der Leij (2010). Identification of network effects is difficult as links in a network
are endogenous and may be correlated with unobservable characteristics of individuals and links. In this chapter we take an alternative route: we focus instead
on the predictive power of social networks in terms of future research output.
This chapter is also related to the more specialized literature on research
productivity. We mention two recent papers in this area, Azoulay et al. (2010)
and Waldinger (2010). These papers both use the unanticipated removal of
individuals as a natural experiment to measure network effects on researchers
productivity. Azoulay et al. (2010) study the effects of the unexpected death of
superstar life scientists. Their main finding is that coauthors of these superstars
experience a 5% to 8% decline in their publication rate. Waldinger (2010) studies
the dismissal of Jewish professors from Nazi Germany in 1933/34. His main
finding is that a fall in quality of faculty has significant and long lasting effects
on outcome for research students. This chapter quantifies the predictive power
of network information over and above the information contained in past output.
We also make some progress in identifying the relative magnitude of the different
potential roles signalling and flow of ideas which a network may play.
2

For a survey of the theoretical work on social networks see Goyal (2007), Jackson (2009)
and Vega-Redondo (2007).

40

2.2 Framework

2.2

Framework

It is standard practice in most organizations to look at the past performance of


job candidates as a guide to their future output. This is certainly true for the
recruitment and promotion of researchers, possibly because research output i.e.,
journal articles and books is publicly observable.
The practice of looking at past performance appears to rest on two ideas. The
first is that a researchers output largely depends on ability and effort. The second
is that individuals are aware of the relationship between performance and reward
and consequently exert effort consistent with their career goals and ambition.
This potentially creates a stable relationship between ability and ambition on the
one hand, and individual performance on the other hand. Given this relationship,
it is possible to (imperfectly) predict future output on the basis of past output. In
this chapter we start by asking how well past performance predicts future output.
We then ask if future output can be better predicted if we include information
about an individuals social network. Social interaction among researchers takes
a variety of forms, some of which are more tangible than others. Our focus is
on social interaction reflected in the coauthorship of a published paper. This
is a concrete and quantifiable form of interaction. Coauthorship of academic
articles in economics rarely involves more than 4 authors. So, it is likely that
coauthorship entails personal interaction. Moreover, given the length of papers
and the duration of the review process in economics, it is reasonable to suppose
that collaboration entails communication over an extended period of time. These
considerations personal interaction and sustained communication in turn suggest several ways by which someones coauthorship network can reveal valuable
information on their future productivity. We focus on two: research networks as
a conduit of ideas; and coauthorship as a signal about unobserved ability and
career objectives.
Let us first focus on the role of research network as a conduit for ideas. Communication in the course of research collaboration involves the exchange of ideas.
So we expect that a researcher who is collaborating with highly creative and
productive people has access to more new ideas. This, in turn, suggests that a researcher who is close to more productive researchers may have early access to new

41

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

ideas. As early publication is a key element in the research process, early access
to new ideas can lead to greater productivity. These considerations lead us to
postulate that, other things being equal, an individual who is in close proximity
to highly productive authors will on average have greater future productivity.
Proximity need not be immediate, however: if A coauthors with B and B coauthors with C, then ideas may flow from A to C through their common collaborator
B. The same argument can be extended to larger network neighborhoods. It follows that authors who are more central in the research network are expected to
have earlier and better access to new research ideas.
As a first step we look at how productivity of an individual, say i, varies
with the productivity of his or her coauthors. We then examine whether is
future productivity depends on the past productivity of the coauthors of his or
her coauthors. Finally we generalize this idea to is centrality in the network
in terms of how close a researcher is to all other researchers (closeness) or how
critical a researcher is to connections among other researchers (betweenness)
the idea being that centrality gives privileged access to ideas that can help a
researchers productivity.
Access to new ideas may open valuable opportunities but it takes ability and
effort to turn a valuable idea into a publication in an academic journals. It is reasonable to suppose that the usefulness of new ideas varies with ability and effort.
In particular, a more able researcher is probably better able to turn the ideas
accessed through the network into publications than a less able researcher. Since
ability and industriousness are reflected to some extent in past performance, we
expect the value of network access to vary with past performance. To investigate
this possibility, we partition researchers into different tier groups based on their
past performance and examine whether the predictive power of having productive coauthors and other related network variables varies systematically across
tier groups.
The second way by which network information may help predict future output
is because the quantity and quality of ones coauthors is correlated with and
thus can serve as a signal for an individuals hidden ability and ambition.
Given the commitment of time and effort involved in a research collaboration, it
is reasonable to assume that researchers do not casually engage in a collaborative

42

2.2 Framework

research venture. Hence when a highly productive researcher forms and maintains
a collaboration with another possibly junior researcher i, this link reveals
positive attributes of i that could not be inferred from other observable data. Over
time, however, evidence on is performance accumulates, and residual uncertainty
about is ability and industriousness decreases. We therefore expect the signal
value of network characteristics to be higher at the beginning of a researchers
career and to fall afterwards.

2.2.1

Empirical strategy

Our empirical strategy is based on the above ideas. Since our focus is on predictive
power, we worry that overfitting may bias inference. To avoid this, we divide the
sample into two halves, one of which is used to obtain parameter estimates, and
the other to assess the out-of-sample predictive power of these estimates. We
thus begin by randomly dividing the authors into two equal size groups. The
first halve of the authors is used to estimate a regression model of researcher
output. We then use the estimated coefficients obtained from the model fitted on
the first halve of the authors to predict researcher output for the authors in the
second halve of the data. We then compare these predictions with actual output.
The purpose of this procedure is to assess the out-of-sample prediction performance of the model. The reason for using out-of sample predictions is that
in-sample errors are likely to understate forecasting errors. As stated by Fildes
and Makridakis (1995) the performance of a model on data outside that used in
its construction remains the touchstone for its utility in all applications regarding predictions. Other drawback of in-sample tests is that they tend to reject the
null hypothesis of predictability. In other words, in-sample tests of predictability
may spuriously indicate predictability when there is none.3
3

Arguments in favour of using out-of sample predictions can be found in Ashley et al. (1980)
who state that a sound and natural approach to testing predictability must rely primarily on
the out-of-sample forecasting performance of models relating the original series of interest (page
1149). Along with Fair and Shiller (1990), they also conjecture that out-of-sample inference
is more robust to model selection biases and to overfitting or data mining. Inoue and Kilian
(2004) provide analytical and Monte Carlo evidence that neither data mining nor parameter
instability is a plausible explanation of the observed tendency of in-sample tests to reject the
no predictability null more often than out-of-sample tests.4 Applications that have determined
the appropriateness of a model based on its ability to predict out-of-sample include Swanson

43

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

The rest of this section develops some terminology and presents the regressions
more formally. We begin by describing the first step of our procedure and then we
explain how we assess prediction performance. The dependent variable of interest
is a measure yit of the future output of author i, defined more in detail in the
data section. This measure takes into account the number of articles published,
the length of each of the articles, and the ranking of the journal where the article
appears.
We first study predictions of yit based on past output and a set of controls xit .
Control variables include: cumulative output since the start ti0 of is career until
t 5; career time ci,t = t ti0 and its square; a time trend t; and the number
of years since is last publication. Squared career time c2i,t is included to capture
career cycle effects, i.e., that researchers publish less as they approach retirement.
We then examine by how much recent research output and network characteristics
improve the prediction. We also compare the accuracy of the prediction when we
use only past output and when we combine it with recent network characteristics.
The order of the regression models we estimate is as follows. We start with
benchmark model 0 which examines the predictive power of the control variables
xi,t :
Model 0

yi,t+1 = xit + it

We then include past output yi,t as additional regressor. This yields Model 1:
Model 1

yi,t+1 = xit + yit 1 + it

In Model 2 we investigate the predictive power of network variables zi,t :


Model 2

yi,t+1 = xit + zit 2 + it

Network variables include the number of is coauthors up to time t, the productivity of these coauthors, and different network centrality measures detailed in
the empirical section. We estimate Model 2 first with one network variable at a
time, then including network variables simultaneously.
(1998), Sullivan et al. (1999), Lettau and Ludvigson (2001), Rapach and Wohar (2002) and
Hong and Lee (2003).

44

2.2 Framework

Finally, in Model 3 we ask if network variables zit improve the prediction of


future output over and above the prediction obtained from Model 1, that is, from
past productivity:
Model 3

yi,t+1 = xit + yit 1 + zit 2 + it

Here too we first consider one network variable at a time to ascertain which
network characteristic have more predictive power. We also estimate Model 3 with
several networks variables together to evaluate the overall information contained
in the network.
Models 0, 1 and 2 are nested in Model 3. A comparison of models 1 and
2 allows us to investigate the relative information content of recent individual
output and recent social network. A comparison of models 2 and 3 examines
whether social network variables have explanatory power over and above the
information contained in recent individual output.
Since our ultimate purpose is to predict research output, we need a criterion to
select a parsimonious set of regressors so as to avoid overfitting. To select among
social network regressors we use the Bayesian Information Criterion (BIC). We
find that, in our case, the lowest values of the BIC criteria are obtained when all
the network variables are included, which is why our final specification includes
them all.
This describes the first step of our analysis. In the second step we evaluate
the predictive accuracy of the different models. To this effect we compare, in the
second halve of the data, the actual research output yi,t+1 to the predictions ybi,t+1
obtained by applying to authors in the second halve of the data the regression
coefficients of models 0 to 3 obtained from the first halve of the data. To evaluate
the prediction accuracy of ybi,t+1 we report the root mean squared errors (RMSE)
defined as:
s
RM SE =

1X
(yi,t+1 ybi,t+1 )2 .
n i,t

If the introduction of an explanatory variable in ybi,t+1 decreases the out-of-sample


RMSE, this variable contains useful information that helps predict researchers

45

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

future productivity.
In order to assess whether forecasts from two models are significantly different
we use the Diebold and Mariano (1995) test. This test is based on the loss
differential of forecasting the future output of an individual i, di,t . As we measure
the accuracy of each forecast by a squared error loss function (RMSE), we consider
a squared loss differential denoted to apply the Diebold-Mariano test, that is,
di,t = 2Ai,t 2Bi,t .
where A is a competing model and B is the benchmark model.
To determine if one model predicts better we test the null hypothesis,
H0 : E[di,t ] = 0,
against the alternative,
H1 : E[di,t ] 6= 0.
Under the null hypothesis, the Diebold-Mariano test is
d
p

avar(d)

=q

V (d)/n

v N (0, 1)

where,
d = n1

di,t ,

i,t

is a consistent estimate of the asymptotic (long-rung) variance of


and V (d)


nd.

f
Our dependent variable, qi,t
, uses information on future output from t + 1 to

t + 3, this leads to a serial correlation in the loss differentials. We adjust for this
then
serial correlation by using a Newey-West type estimator of V (d),
=
V (d)

X
i

(
0 + 2

T t
X

wm(T ) ),

= Cov(d
i,t , di,t ),

=1

where wm(T ) is the so called Barlett Kernel function:

46

2.3 Data

( 
wm(T ) =

1
0,

m(T )

if 0 m(T
1,
)
otherwise,

and m(T ) also known as the truncation lag is a number growing with T , the
number of periods in the panel. The truncation lag has been chosen by the AIC
and BIC.

2.3

Data

The data used for this chapter are drawn from the EconLit database, a bibliography of journals in economics compiled by the editors of the Journal of Economic
Literature. From this database we use information on all articles published between 1970 and 1999. These data are the same as those analyzed by Goyal, van
der Leij, and Moraga-Gonzalez (2006), Fafchamps, Goyal and van der Leij (2010)
and van der Leij and Goyal (2011).

2.3.1

Definition of variables

The output qit of author i in year t is defined as:


qit =

X pagesj journal qualityj


Number of coauthorsj
jS

(2.1)

it

where Sit is the set of articles j of individual i published in year t. The variable
pagesj is the number of pages of article j divided by the average number of pages
of articles published in the journal.5 When available, the Journal quality variable
is taken from the work of Kodrzycki and Yu (2006) hereafter KY. Unfortunately,
KY do not include in their analysis all the journals in the EconLit database. To
avoid losing information and minimize measurement error in research output,
we construct a prediction of the KY quality index of journals not included in
their list.6 The actual KY journal quality index is used whenever available. For
5

The number of pages is truncated above fifty pages to correct for a small number of unusually long published articles. Overly long papers are usually literature review articles. Hence
not truncating above fifty pages would probably overrepresent their contribution.
6
To do this, we regress the KY index on commonly available information of each journal listed
in EconLit, such as the number of published articles per year, the impact factor, the immediacy

47

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

comparison purposes, qit assigns 28 points to a single-authored 9 page article or


a 18 page two-author paper in the American Economic Review.
In (2.1) we divide the quality of a journal article by the number of coauthors
to obtain the output of author i. The rationale behind this calculation is the work
of Sauer (1988), who uses wage data to estimate the implicit discount factor that
employers apply to two-author papers compared to single-authored papers. Sauer
estimates this discount factor to be between 0.429 and 0.689, with a mean of 0.55
which is not very different from 1/2, or one divided by the number of coauthors.
We are interested in predicting future output. In economics, the annual number of papers per author is small and affected by erratic publication lags. We
therefore need a reasonable time window over which to aggregate output. The
results presented here are based on a three year window, but our findings are insensitive to the use of alternative window lengths, e.g., five years. Our dependent
variable of interest is thus the output of author i in years t + 1, t + 2, t + 3:7
qitf =

qi,t+1 + qi,t+2 + qi,t+3


3

(2.2)

Unsurprisingly, the distribution of qitf has a fat upper tail. To avoid our results
from being entirely driven by a handful of highly productive individuals, we take
the logarithm of the dependent variable as follows:8
yi,t+1



f
= ln 1 + qit

The analysis presented in the rest of the paper uses yi,t+1 as dependent variable.
We expect recent productivity to better predict output over the next three
years than ancient output. To capture this idea, we divide past output into
index, the Tinbergen Institute Index, an economics dummy, interaction terms between the
economics dummy and the impact factor, and various citation measures. Estimated coefficients
from this regression are then used to obtain a predicted KY journal quality index for journals
not in their list. Since most of the journals that KY omitted are not highly ranked, their
predicted quality index is quite small.
7 f
qit is averaged to facilitate comparison with alternative definitions over different time intervals.
8
We have considered other functional forms such as y 15 ,y 50 ,y 75 , log of log, and Tobit. In a
scatter plot between actual output and fitted output, the specification that provides the best
fit is ln(x + 1), which is the one we report here. In particular, the fit quality is much better in
logs than in levels.

48

2.3 Data

two parts: cumulative output until period t 5, which captures is historical


production and is used as control variable; and output from t 4 until t, which
represents is recent productivity and is expected to be a strong predictor of
future output. We define recent output qitp from t to t 4 as:
qitr = qit + qi,t1 + qi,t2 + qi,t3 + qi,t4
Control variables xit include cumulative output qitc from the start ti0 of is
career until t 5:
qitc = qi,ti0 + ....qi,t6 + qi,t5
where ti0 is the year in which individual i obtained his or her first publication.
r
c
) as regressors, since the distribution of both
) and log(1 + qi,t
We use log(1 + qi,t

variables contains fat tails. We also include the number of years rit with no
published article since is last article was published:

0
if qit > 0
rit =
and
ri,t1 + 1 otherwise.
ri,ti0 = 0
Variable rit is used as proxy for leave or retirement from academics: the longer
someone has not published, the more likely he or she has retired or left research.
Other controls include career time cit t ti0 , career time squared, and a time
trend t. To summarize, the control variables are xit = {qitc , rit , cit , c2it , t}.
Next we turn to the network variables. Given that we wish to investigate
whether network characteristics have predictive power over and above that of
recent productivity, network variables must be constructed in such a way that
they do not contain information outside the time window of qitr . We therefore
define the coauthorship network Gt at time t over the same time window as qitr ,
that is, using all joint publications from year t 4 to t. At time t, two authors i
and j are said have a link gij,t in Gt if they have published in an EconLit journal
in years t 4 to t. Otherwise gij,t = 0.
The set of network statistics that we construct from Gt is motivated by the
theoretical discussion of Section 2. Some of the network statistics we include
in our analysis are, on a priori grounds, more correlated with access to new

49

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

scientific ideas; others are included because they are thought to have a high
signalling potential. Measures of network topology such as centrality and degree
reflect network proximity and thus belong primarily to the first category while
other measures, such as the productivity of coauthors, are likely to have greater
signalling potential.
Based on these observations, the list of network variables that we use in the
analysis is as follows. We say that there is a path between i and j in Gt if gij,t = 1
or there exists a set of distinct nodes j1 , . . . , jm , such that gij1 ,t = gj1 j2 ,t = . . . =
gjm j,t = 1. The length of such a path is m + 1. The distance d(i, j; Gt ) is the
length of the shortest path between i and j in Gt . We use the following standard
definitions:
(First order) degree is the number of coauthors that i has in period t 4
to t, n1i,t = |Ni (Gt )|, where Ni (Gt ) = {j : gij,t = 1}.
Second order degree is the number of nodes at distance 2 from i in period
t 4 to t, n2i,t = |Ni2 (Gt )|, where Ni2 (Gt ) = {k : d(i, k; Gt ) = 2}.
Giant component: The giant component in Gt is the largest subset of nodes
such that there exist a path between each pair of nodes in the giant component, and no path to a node outside. We create a dummy variable which
takes value 1 if an author belongs to the giant component and 0 otherwise.
Within the giant component we consider the following two global proximity
measures.9
c
Closeness centrality Ci,t
is the inverse of the average distance of a node to

other nodes within the giant component and is defined as:


c
Ci,t
= P

nt 1
d(i, j; Gt )

j6=i
c
where nt is the size of the giant component in year t. Because Ci,t
has fat
c
tails, we use log(1 + Ci,t
) as regressor instead.
9

For a careful discussion on the interpretation of centrality measures, see Wasserman and
Faust (1994).

50

2.3 Data

b
is the frequency of shortest paths passing through
Betweenness centrality Ci,t

node i and is calculated as:


b
Ci,t

i
(Gt )
j,k
=
j,k (Gt )
j6=k:j,k6=i

i
(Gt ) is the number of shortest paths between j and k in Gt that
where j,k

pass through node i, and j,k (Gt ) is the total number of shortest paths
between j and k in Gt . In the regression analysis, we similarly use log(1 +
b
Ci,t
) as regressor.

Next, we define regressors that capture the productivity of coauthors and that
of coauthors of coauthors. We apply the ln(x + 1) transformation to them as well.
Productivity of coauthors: is defined as the output of coauthors of author
i from t 4 to t
X

qit1 =

r
qjt

jNi (Gt )
r
is the output of j from period t 4 to period t (excluding papers
where qjt

that are coauthored with i).


Productivity of coauthors of coauthors: the output of coauthor of coauthors
of author i from t 4 to t,
X

qit2 =

r
qkt

kNi2 (Gt )
r
where qkt
is the output of k from t 4 to t excluding papers that are

coauthored with the neighbors of i, Ni (Gt ).


We also include a dummy variable that takes value 1 for author i if one of is
r
coauthors in Gt has an output qjt
in the top 1% of the distribution of qitr .

2.3.2

Descriptive statistics

Table 2.1 provides summary statistics of the variables included in the analysis.
Column 1 provides the mean value of each variable. Column 2 shows the standard

51

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

deviation and column 3 provides correlations between the different variables and
future productivity.
Table 2.1: Summary statistics

Output
Future productivity
Past stock output
Recent past output
Network variables
Degree
Degree of order two
Giant component
Closeness centrality
Betweenness centrality
Coauthors productivity
Coauthors of Coauthors prod.
Working with top 1%

Mean

Std. deviation

Correlations

2.1
32.7
14.4

7.8
88
40

1
0.42
0.65

1.4
2.2
.24
.02
1.5
27.7
61.3
.03

1.8
4.7
.43
.03
3.7
97
229
.17

0.28
0.29
0.24
0.27
0.29
0.48
0.44
0.34

Number of observations
332863
332863
332863
Number of authors
75109
75109
75109

Summary statistics for clustering are based on 112916 observations.

A significant fraction of economists in the EconLit database publish very


infrequently and do not publish a single article over a five year period. For that
reason, we have excluded from our sample all observations for which the number
r
r
from t 4 to t is zero. Note that whenever qjt
= 0, i is by definition
of papers qjt

not linked to anyone in Gt , so their observations are not useful to identify the
r
and research network.
respective predictive power of recent output qjt

We also excluded observations relative to authors in the earliest stage of their


career, i.e., for which cit < 6. The reason is that these authors have not yet
established a publication record and network so that there is little information on
which to form predictions of future output. Put differently, we do not consider
the problem faced by employers when hiring economists in the junior market,
where the graduating department, job market paper, and recommendation letters
matter more than publication record.
We draw attention to some distinctive features of the data. First, we observe
that the variance in future output qitf is large, with a standard deviation 3.71 times
larger than the mean. There is a high positive correlation of 0.65 between recent

52

2.4 Empirical findings

r
and future output qitf . Figure 2.1 shows a scatter plot and a linear
output qjt
r
regression line with confidence interval between qitf and qjt
for 1000 randomly

selected observations. This visually confirms that, as anticipated, recent past


output has a strong predictive power on future output.

Figure 2.1: A scatter plot of future output and recent past output.

4
Recent past output
95% CI
Future productivity

Fitted values

Second, we observe a high correlation between qitf and several network variables such as coauthors output qit1 , author degree, and closeness and betweenness
centrality. The network variables most highly correlated with future productivity
are qit1 and qit2 , the productivity of is coauthors and coauthors of coauthors, respectively: correlation coefficients are 0.48 and 0.44, respectively. Other network
variables such as degree, closeness, and betweenness centrality are also correlated with future output qitf , but the correlation coefficient is smaller around
0.3. Figure 2.2 shows the relationship between some network variables and future
output.

2.4

Empirical findings

We have seen that there is a reasonably strong correlation between future output
and recent past output, but also between future output and the characteristics
of is recent coauthorship network. We now turn to a multivariate analysis and
estimate the different models outlined in Section 2.2. We start by presenting the

53

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Figure 2.2: Scatter plots of future productivity on closeness centrality and coauthors productivity.

.02

.04

.06
.08
Closeness centrality
95% CI
Future Output

.1

.12

Fitted values

4
Coauthors productivity
95% CI
Future productivity

6
Fitted values

results on the predictive power of network information. This leads to a closer


examination of the role of signalling and flow of ideas in networks. We then
examine the relation between the productivity of an individual author and the
predictive power of network variables.

2.4.1

Predicting future output

Table 2.2 presents the prediction results for Model 0, the baseline model with
r
controls xit = {qitc , rit , cit , c2it , t}, Model 1, that includes recent output qjt
, and

Model 2 that includes network variables. Column 1 presents the R2 of the regression on the in-sample data for each model. Column 2 shows the out-of-sample
RMSE for each model. Column 3 compares the RMSE of Model 1/Model 2 with
the benchmark model, Model 0. Column 4 shows the coefficient of each regressor.
r
Recent output qjt
explains slightly less than half of the variation in future

output qitf . More than half of the variation in qitf around 58% of the total

54

2.4 Empirical findings

Table 2.2: Prediction Accuracy: Models 1 and 2

R2 RMSE
.27
.759

Model 0
Model 1
Recent past output
.42
Model 2
Degree
.31
Degree of order 2
.31
Giant component
.30
Closeness
.31
Betweenness
.32
Coauthors productivity
.33
Coauthors of Coauthors prod. .32
Working with a top 1%
.31

Significant at 1% level,

.677

RMSE Diff.
-

Coefficients
-

10.8%

.33

.738
2.77%
.738
2.75%
.745
1.83%
.737
2.80%
.734
3.28%
.724
4.54%
.731
3.62%
.738
2.77%

Significant at 5%.

.10
.04
.35
.57
.18
.14
.10
1.05

variation remains unexplained after we take qitr into account. The question is:
can we improve upon this using network variables?
We begin by examining the predictive power of network variables when added
to controls xit . This is achieved by comparing Model 2 results with Model 0.
Results, presented in Tables 2.2, show that coauthor productivity qit1 , closeness
c
, and the productivity qit2 of coauthors of coauthors are statistically
centrality Ci,t

significant and help predict future output. However, the predictive power is much
less than recent output, for example, coauthors productivity reduces the RMSE
by 4.5% whereas recent output reduces the RMSE by 10.8%.
We then combine recent output qitr and network variables in Model 3. Results
presented in Table 2.3 show that the same network variables remain significant
once we include qitr as regressor. Being significant does not imply that network
variables are very informative, however.
For this we have to examine the improvement in prediction that they represent.
Table 2.4 shows that the R2 of Model 3 is greater than the R2 obtained under
model 1. This means that network information taken in combination with recent
output yields a more accurate prediction than a prediction based on past output
alone. The gain in explanatory power is small, however: the R2 rises from 0.42
in Model 1 to 0.44 in Model 3. In line with this, the RMSE declines from 0.68

55

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Table 2.3: Prediction Accuracy of Model 1 and 3.

R2 RMSE
.27
.759

Model 0
Model 1
Recent past output
.42
Model 3
Degree
.43
Degree of order 2
.43
Giant component
.42
Closeness
.43
Betweenness
.43
Coauthors productivity
.43
Coauthors of Coauthors prod. .43
Working with a top 1%
.43

RMSE Diff.
-

Coefficients
-

.677

10.8%

.33

.672
.672
.674
.671
.671
.670
.671
.670

11.4%
11.5%
11.1%
11.5%
11.5%
11.7%
11.6%
11.7%

.05
.02
.15
.34
.10
.06
.05
.60

Significant at 1% level, Significant at 5%. The Diebold-Mariano test of the


different Models 3 are performed for the RMSE difference between Model 3 and
Model 1.

down to 0.66 when we incorporate network information, this small difference is


statistically significant - as it is shown by the Diebold-Mariano test -. From this
we conclude that network variables contain predictive information over and above
what can be predicted on the basis of past output, but this information gain is
modest.
Table 2.4: Prediction accuracy of the multivariate models

R2
RMSE
Model 0
0.271 0.759
Model 1
0.417 0.677
Multivariate Model 2 0.366 0.707
Multivariate Model 3 0.442 0.663

RMSE Diff.
10.81%
6.802%
12.65%

Significant at 1% level. The Diebold-Mariano test of the Multivariate Model 3 is


performed for the RMSE difference between Multivariate Model 3 and Model 1.

2.4.2

Networks and career cycle

Next we examine the relative importance of the two potential ways in which networks may matter: flow of ideas and signalling. As time passes, the publication

56

2.4 Empirical findings

record of a researcher builds up. Since ability, research ambition, and other personality traits are relatively stable over time, this accumulating evidence provides
a more accurate estimate of the type of the person. Hence it becomes easier to
judge his or her ability and research ambition on the basis of the publication
record alone. Based on this, we would expect network variables to have less and
less additional predictive power.
Research networks can, however, be important conduits of valuable research
ideas as well. Unlike the signalling value of networks, access to new research
ideas remains important throughout a researchers career. Thus if network variables help predict future output because they capture access to new ideas, their
predictive value should remain relatively unchanged over a researchers career.
To investigate this issue, we estimated the predictive power of network variables for different career time cit . The RMSE of Models 0, 1 and Multivariate
Models 2 and 3 are plotted in Figures 2.3.

57

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

.6

.65

.7

.75

.8

Figure 2.3: RMSE out-of-sample across Career time.

10

12

14
Career time

RMSE M0
RMSE MV2

16

18

20

22

RMSE M1
RMSE MV3

Career age cit is on the horizontal axis while RMSE is measured on the vertical
axis. The Figure shows that, as anticipated, the predictive accuracy of all the
models improves reflected in the decline in RMSE with career time. This is
primarily because the control variables xit particularly cumulative output qitc
reveal more information about individual ability and preferences over time.
To examine whether the relative predictive gain of network variables varies
with career time, we report in Figure 2.4 the difference in RMSE between Model
1, Multivariate Models 2 and 3 versus Model 0.
We note a marked decline in the difference between Models 1 and 3 over the
course of a researchers career. After time t = 15, the prediction accuracy of
models with or without network variables becomes virtually indistinguishable.
As it is shown in table 2.5, the differences between Multivariate Model 3 and
Multivariate Model 1 are not statistically significant after t = 15. This indicates
that, for senior researchers, network variables contains little information over and
above the information contained in past and recent output.

58

2.4 Empirical findings

10

15

Figure 2.4: RMSE % Difference across Career time.

10

12

14
Career time

RMSE % diff M1 M0
RMSE % diff MV3 M0

16

18

20

22

RMSE % diff MV2 M0

Table 2.5: Diebold-Mariano test across career time

Career Time
14
15
16
17
18
19
20
21
22

Model 1

Multivariate Model 2

Multivariate Model 3

-9.29(.00)
-9.51(.00)
-8.89(.00)
-9.40(.00)
-7.42(.00)
-7.23(.00)
-5.16(.00)
-4.67(.00)
-2.65(.01)

-6.17(.00)
-5.36(.00)
-4.55(.00)
-3.81(.00)
-3.60(.00)
-3.22(.00)
-2.29(.02)
-1.43(.15)
-1.68(.09)

-3.33(.00)
-1.99(.05)
-1.55(.12)
-1.13(.26)
-1.00(.32)
-0.54(.58)
-0.71(.47)
0.19(.84)
0.45(.65)

p-value in parenthesis. The Diebold-Mariano test of the Multivariate Model 3 is


performed for the RMSE difference between Multivariate Model 3 and Model 1.

59

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

2.4.3

Network information across productivity categories

In this section we pursue these issues further and examine whether the predictive
power of network information varies systematically with recent output qitr . This
analysis is predicated on the idea that it takes talent and dedication to transform
the new ideas conveyed by the research network into publishable output. Consequently we expect the predictive power of network variables to increase with
ability and hence with qitr at least over a certain range.
To investigate this possibility, we divided the observations into six tier groups
on the basis of their recent output qitr . The top category includes authors in the
top 1% in terms of qitr . The second top category includes authors in the 95-99
percentiles of qitr . The third category covers authors in the 90-94 percentiles, the
fourth includes authors in the 80-89 percentiles, the fifth those authors in the
50-79 percentiles; and the last category is for authors below the median qitr .
Figure 2.5 shows the RMSE of Models 0, 1 and 2 across the different categories.
For the most productive authors, those above the 99th percentile, network
variables have predictive power in explaining future research output but much
less than recent output. For the next category of researchers, those in the 95-98
percentile range, network information has greater predictive power. For instance,
network variables alone are as informative about future productivity as recent
output qitr . Even more strikingly, for researchers in the third category, the 90-94
percentile range, network variables are better at predicting future research output
than qitr . All the Models have statistically significant predictive power across the
different tiers group, as it is shown by the Diebold-Mariano test in table 2.6.
By contrast, network information has little but significant predictive power
for low productive individuals (those in the lower half of the distribution). This
suggests that, for researchers with low ability or research ambition, having published with high quality coauthors has little informative content regarding their
future output perhaps because they are unable to take advantage of the access
to information and research ideas that good coauthors provide.

60

2.5 Robustness

2.5

Robustness

In this Section we investigate the robustness of our results to the various assumptions made in constructing the variables used in the estimation.
In the analysis so far we have used average productivity from t + 1 to t + 3
as the variable qit to be predicted (see equation 2.2). The rationale for doing so
is that the distant future is presumably harder to predict than the immediate
future, and we want to give the model a fair chance. Yet, in economics there are
long lags between the submission and publication of a paper, and wide variation
in these lags across papers and journals. Variation in publication lags thus introduces additional variation in the variable we are trying to predict, and may thus
lead us to underestimate the predictive power of network information. To check
whether this is affecting our results, we repeat the analysis using average future
productivity over a five year window instead of three:
qi,t+1 + qi,t+2 + qi,t+3 + qi,t+4 + qi,t+5
.
qitf =
5


and, as before, use ln 1 + qitf as the variable we seek to predict. Results, not
shown here to save space, are similar to those reported here except that the
predictive power of network variables is slightly larger using a five year window.
In particular, we still find that the predictive power of network variables declines
as an author gains experience.
In the results reported so far, we kept all authors in the dataset but dropped
observations for years in which their past output was zero, i.e., we restricted the
estimation to years where qitr > 0. The rationale for doing so is that authors
who have not published anything in five years have probably not been active in
research. Consequently their recent output is bound to be a poor predictor of
their future publication record, should they return to an active research career.
We nevertheless worry that this may introduce a selection bias. To investigate
whether such a bias may have affected our results, we redo the analysis dropping
all observations of those authors who, at some point of their academic career,
did not publish anything for five years or more. Results, not shown here to save
space, are basically unaffected if we omit authors who go in and out of research.

61

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Next we investigate whether results are sensitive to our definition of output


qit . First, we examine whether different results obtain if we do not correct for
article length. It is customary for studies of author and departmental productivity
in economics to correct for article length (e.g., Kodrzycki and Yu 2006). But
perhaps this is inappropriate for our purpose, e.g., because employers care about
publication in top journals, not the length of published articles. To check whether
this affects our results, we redo the analysis using a productivity measure that
does not correct for the number of published pages, i.e., using:
qit =

X
jSit

Qualityj
,
Number of authorsj

as our definition of output. Results, not shown here to save space, show that the
predictive power of network variables is unaffected.
We also worry that dividing the quality of an article by the number of coauthors, as we did above and in equation (2.1), underestimates author quality. This
is not what Sauer (1988) found with respect to employers, but even if employers
only give half-weight to two-author papers, the predictive power of a two author paper on future output qitf may still be higher than half of an equivalent
single-author paper. This could potentially lead to spurious inference.

10

To investigate this possibility, we redo our analysis without any overweighting,


that is, we calculate output using:
qit =

pagesj journal qualityj

jSit

Results are shown in tables 2.7 and 2.8.


The first shows the predictive power of Model 2, while the latter shows the
predictive power of multivariate models 2 and 3, respectively. We see that the
10

To see why, consider the polar case where the predictive power of a two author paper is the
same as a single author paper. Suppose we have two identical authors A and B who, after t,
publish the same quality single-authored papers. One of them, A, in the past published alone
while B published with a coauthor. If we estimate model 1 using (2.1) as sole predictor, the
future output of B will be under-predicted since, by construction qAt > qBt . To compensate
for this, the degree of author B in model 3 may show up with a significant coefficient even if
degree does not, by itself, help predict future output. Put differently, underestimating output
by underweighting joint publications may lead us to erroneously attribute predictive power to
network variables.

62

2.5 Robustness

predictive power of network variables is, if anything, slightly higher than before
and statistically significant. From this we conclude that our results regarding the
predictive power of network variables are not an artifact of having erroneously
underweighted coauthored publications in qit .
Another possible confounding effect is authors affiliation. In particular, the
predictive power of network variables may depend on whether an author keeps the
same affiliation. For example, when someone changes affiliation, the predictive
value of network variables may fall because they loose their contacts. Alternatively, affiliation may be correlated with the propensity to coauthor e.g., because
authors isolated in lower quality environment find it more difficult to coauthor. If
a good affiliation also helps authors be productive and authors with a good affiliation have more coauthors, network variables may pick up the predictive power
of a good affiliation.
Starting in 1990 the EconLit dataset contains some information on affiliation.
Based on this information we identify authors in one of the top 150 departments of
economics. We construct a variable Ait that takes value one if author i changed
affiliation between the t 4 to t period and the t + 1 to t + 3 time period and
0 otherwise. If the predictive power of network variables is not significant for
authors who switch affiliation, this may be construed as circumstantial evidence
that network variables have predictive power only because they pick up affiliation.
Given that we only have affiliation data from 1990 to 1999, variable Ait is only
defined for 1994 to 1996 and the usable sample shrinks to those three years only.11
We investigate whether the predictive power of network variables is the same
for authors who change affiliation and those who do not. Results for the Multivariate version of Models 2 and 3 are shown in Table 2.9. The null hypothesis
is that the coefficients of the network variables interacted with Ait are equal
to zero, indicating that the predictive power of network variables is unaffected
by changes in author affiliation. In both multivariate models, the null hypothesis is rejected, suggesting that the information content of a researchers network
depends on affiliation.
11

We cluster standard errors by author to allow for the possibility that the error term is
correlated across period for a given author.

63

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

We also examine which network variables is most affected by the inclusion


of an interaction term with Ait . To this effect, we conduct the same test for
each network variable separately. Results not presented for the sake of brevity,
show that the only variable whose predictive power changes is the betweenness
centrality measure. To confirm this further, we present in table 2.9 joint tests for
multivariate models 2 and 3 but excluding betweenness centrality from the list of
network variables.
In this case, the null hypothesis can not be rejected in both multivariate models at the 5% significance level. We conclude that the only network variable whose
predictive power is affected by changes in affiliation is betweenness centrality.
Finally, there remain a possible concern about functional form: if the predictive power of qitr is non-linear and this non-linearity is correlated with network
characteristics, this could generate a spurious predictive power for network variables.
We also consider an alternative specification with a quadratic term in qitr to
capture a possible non-monotonic relationship between recent and future output.
The results, presented in table 2.10, show that the coefficient of the quadratic
term is negative and significant at the 1% level. But network variables remain
significant and their predictive power is quite similar to the non-quadratic case.
This confirms that network variables do not have predictive power simply because
they are capturing a non-linearity in the effect of qitr on qitf .

2.6

Exploring the channels

Having established that past network variables help predict future output over
and above what can be predicted from past output, we would of course like to
know why. A definite answer to this question is beyond the scope of this article,
given the limited data at our disposal. We nevertheless venture to offer elements
of an answer based on the data we do have.

2.6.1

Career cycle effects and network variables

We examine more closely the two potential roles of network variables in predicting
future productivity. Our maintained assumption is that network variables such as

64

2.6 Exploring the channels

coauthor productivity have greater signalling content than centrality measures,


which are more likely to proxy for access to new ideas. Findings are reported in
Figures 2.6.
They indicate that network variables such as coauthor productivity qit1 are
informative at the start of a career, but their predictive power declines sharply
across time. In particular, as it is shown in table ?? this variable has not predictive
power over and above what can be predicted using knowledge on past and recent
output after t = 13. In contrast, topological variables such as first and second
order degree start off with a modest information value but they retain it over
time. Table 2.11 shows that the predictive power of the degree variable remains
statistically significant across the career time of an author - with the exception
of t = 20 and t = 21 -.
From this we tentatively conclude that coauthor productivity qit1 is informative
primarily as a signal of a researchers ability. Topological variables such as first
and second order degree are more revealing of the use of the network as a conduit
for ideas that help future productivity. In terms of predictive power, however,
the signalling content of networks is quantitatively more important than the flow
of ideas.

2.6.2

Predictive value of output on networks

We have argued that one of the possible reasons why the current network of an
author i helps predict is future output is because it is correlated with is intrinsic
quality: coauthors agree to work with i because they know something about is
quality that is not fully captured in is past output. If is unobservable quality
predicts is capacity to attract coauthors, so should is observable quality. It
follows that is past output should help predict is future network.
To investigate this idea, we proceed as before and seek to predict future network with past network and past output. We begin by estimating the equivalent
to Model 1:
Model 1

zit+5 = xi,t + zit 1 + it

65

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

where zit is is network at time t, zit+5 is is network at t + 5 measured using


all of is joint publications between t + 1 and t + 5, and the control vector xit =
{rit , cit , c2it , t} is as before. We then add past output variables qitr and qitc to obtain
the equivalent to Model 3 but with future network as the explanatory variable:
Model 3

zit+5 = xi,t + zit 1 + qitr 2 + qitc 3 + it

We examine whether including qitr and qitc improves the prediction of zi,t+5 over
and above the prediction obtained from Model 1.
We focus on two network variables as dependent variable: degree n1i,t and
coauthors productivity qit1 .12 If the predictive power of past network on future
productivity is due to a signalling effect, we expect is quality as researcher to
have a strong predictive power on the quality of coauthors that i manages to
collaborate with. In contrast, we do not expect a particularly strong predictive
effect of is inherent quality on the number of future coauthors n1i,t : low-ranked
authors can find other low-ranked coauthors to publish with.
We first examine whether past performance predicts future degree. The dependent variable is constructed as:
zi,t+5 = |Ni (Gt+5 )|
where Gt+5 is the co-authorship network constructed using all joint publications
between t + 1 and t + 5. The results, presented in table 2.12, show that going
from Model 1 to 3, that is, incorporating past and recent output qitr and qitc only
leads to a modest but significant improvement in prediction accuracy over and
above what can be predicted from past degree zit and controls: the improvement
in RMSE is only 1.07%.
We then redo the same analysis using coauthor productivity as our network
variable of interest. The dependent variable is constructed as:
zi,t+5 =

qjt+5

jNi (Gt+5 )
12

We also examined whether past output helps forecast betweenness. The results are similar
to those obtained for degree and are omitted here.

66

2.6 Exploring the channels

where Ni (Gt+5 ) is the neighborhood of i in network Gt+5 and qjt+5 is the output
of j from year t + 1 to year t + 5 (excluding papers that are coauthored with i).
Results are shown in Table 2.13. From the R2 of Model 2, we note that past and
recent output qitr and qitc explain about 28% of the variation in future coauthors
productivity zi,t+5 . Including qitr as predicting variable in addition to qitc decreases
RMSE by 6.1%. Output variables qitr and qitc remain significant after we include
zit as additional regressor, the predictive power of zit is high: the RMSE declines
from 1.642 down to 1.536. In addition, if we go from Model 1 to Model 3,
i.e., start with zit and add qitr and qitc , we note a large improvement in predictive
power, with RMSE falling by 4.18%.
These findings are consistent with the idea that the prediction power of past
network on future output is at least partly due to signalling, that is, due to
correlation between an authors intrinsic quality and the quality of his or her
coauthors. The most likely explanation is that high quality coauthors only accept
to collaborate with high quality authors.

2.6.3

Duration of predictive value

Another possible explanation for why the collaboration network helps predict
future productivity is network externalities, i.e., that having coauthors helps authors be more productive. Presumably such network externalities have a shortterm effect. After all, publication success in academia requires remaining informed about new ideas and changing paradigms. In contrast, an authors intrinsic quality is largely time invariant. This suggests an indirect way of testing
network externalities: while the signalling effect of network variables should have
predictive value both in the short and long run, network externalities should only
help predicting research output in the short run.
To implement this idea, we examine whether the capacity of past network
variables to predict future output falls with an increasing forecasting time horizon.
The new dependent variable is the average productivity of author i from t + 4 to
t + 6, that is, qitf three years ahead:
f
qit+3
=

qi,t+4 + qi,t+5 + qi,t+6


,
3

67

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Results are presented in Table 2.14. We find that the predictive power of the
f
network variables is only slightly smaller for qit+3
than for qitf . This finding is

consistent with the signalling hypothesis.

2.6.4

Diversity

This test, however, ultimately relies on the untested assumption that network
externalities decay rapidly. To provide less roundabout evidence, we make use
of the EconLit data to construct an index of diversity in coauthorship network.
Social interactions partly determine the diversity of dispersed knowledge that a
person can access (Bonacich, 1987). Coauthors channel ideas and information
that is most useful when they do not duplicate each other. Hence someone who
collaborates with others in different fields may bring to future research projects
not just his or her skills and cumulative experience, but also the dispersed resources and knowledge accessible via network ties (Singh, 2007). Having access to
dispersed knowledge gives a competitive advantage in developing new and more
creative ideas (Burt, 2003), which might results in future output gains. Hence an
author that has collaborators in a narrow field of study presumably has less access
to new ideas from other sub-fields and disciplines, while an author with diverse
coauthors may gain access to ideas that will make him or her more creative in
the medium run.
To investigate this idea, we examine whether the diversity of an authors
social neighborhood helps predict his or her future output as should be the
case if network externalities happen through access to diverse ideas. As measure of neighborhood diversity, we apply the cosine similarity index proposed by
Fafchamps, Goyal and Van der Leij (2010) to capture field overlap between two
authors. Using the EconLit database, Fafchamps et al. (2010) categorize articles
into 121 subfields according to the first two digits of their JEL codes. Articles
with multiple JEL codes are assigned proportionally to each of the corresponding
fields. So for instance, if an article lists three JEL codes, this article is given
a weight of a third in the three corresponding JEL fields. Let xsit denote the
fraction of all articles published by i in field s over the period from t 4 to t.
P
By construction, s xsit = 1. Further let xsNi ,t denote the the fraction of articles

68

2.6 Exploring the channels

written by the coauthors of i in field s between t 4 and t, excluding the work


coauthored with i. The cosine similarity index between author i and his or her
neighborhood Ni (Gt ) is defined as:
xsit xsNi ,t
2
P
s
s 2
s (xit )
s xNi ,t
P

wit = qP

The higher the overlap in fields of interest between i and Ni , the higher the
index wit is. In other words, a lower value of wit means that i has a more diverse
neighborhood. Building on the evidence presented in Fafchamps et al. (2010), we
expect an inverted-U relationship between diversity wit and future productivity
qitf : diversity grants access to more varied ideas and this should help improve
productivity; but there may be a limit to that process, i.e., a point beyond which
too much diversity becomes an unnecessary distraction. We include a square term
in wit to allow for this possibility.
We evaluate the predictive power of wit and wit2 using the same empirical
strategy as before. The results, presented in table 2.15, shows a strong nonlinear relationship between diversity and productivity. But the predictive power
of the cosine similarity index is low: the RMSE decreases from 0.787 to 0.778
after incorporating wit and wit2 . This small, but statistically significant difference
between the RMSE of Model 3 and the RMSE of model 1 suggests that prediction
accuracy only improves slightly over and above what can be predicted by recent
output. These results fail to provide support for the idea that the predictive
power of past network is driven by network externalities capturing access to varied
research ideas.
The possibility remains that network externalities in the form of new ideas
take more time to affect published output. To investigate this possibility, we
f
f
examine the predictive power of wit (and wit2 ) on qit+3
and qit+6
, that is, the

output of author i in years t + 4 to t + 6 and in years t + 7 to t + 9, respectively.


Results, not shown here to save space, confirm the low predictive power of the
cosine similarity index on future output.
To summarize this section, we have found further evidence consistent with
the signalling role of past coauthorship networks. In contrast, we have found no
additional evidence suggesting that the predictive power of network variables is

69

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

due to strong network externalities: we find no evidence that the diversity of


someones coauthors helps predict future productivity, as would have been the
case if diversity in field interests grants access to a wider array of research ideas.
These findings are not definitive, since they are limited by the available data, but
they are nevertheless suggestive.

2.7

Concluding remarks

In this chapter we have examined whether information about a researchers coauthor network reveals information that helps predict their future output. Underlying our study are two main ideas. First, a collaboration that results in a published
article reveals valuable information about an authors ability and research ambitions. This is particularly true for junior researchers whose type cannot be fully
assessed from their cumulative output. Secondly, research networks, as evidenced
by coauthorship, provides access to new research ideas that can be turned into
subsequently published papers. But doing so requires a minimal level of ability
and dedication that not all researchers possess.
To investigate these ideas, we examine coauthorship in economics. Our focus
is not on causality but rather on predictive power. For this reason, we adopt a
methodology that eliminates data mining and minimizes the risk of pre-testing
bias. To this effect we randomly divide the data into two halves. Parameter
estimates are obtained with one halve but predictions are judged by how well
they perform in the other halve of the sample.
We find that information about someones coauthor networks leads to a modest improvement in the forecast accuracy of their future output over and above
what can be predicted based on their past individual output. The network variables that have the most information content are the productivity of coauthors,
closeness centrality, and the number of past coauthors. These results are robust
to alternative specifications and definitions of variables.
We hypothesize that the predictive power of network variables derives from
two fundamentally distinct phenomena: the signalling content of past collaborations, which we conjecture is particularly large with high productivity coauthors;
and the access to new ideas that the network provides, a concept that is best

70

2.7 Concluding remarks

proxied by an authors position in the networks topography. We present evidence suggesting that the signalling content of network is quantitatively more
important than the flow of ideas in terms of predictive power. But the former declines sharply while the latter remains stable over time. Furthermore, we find no
evidence that the diversity of ones coauthors helps predict future productivity.
We also investigate whether the predictive power of network variables is
stronger for more talented researchers, as would be the case if taking advantage
of new ideas requires talent and dedication. We find that the predictive value of
network information is non-existent for less talented and dedicated researchers:
network variables help predict future productivity for researchers above the median. We also find that for the most able researchers those in the top 1% of
the output distribution network variables have no information content, possibly
because these researchers are so talented and dedicated that they would succeed
irrespective of their past collaboration history.
The work presented here leaves many questions unanswered. In particular, we
do not claim to have identified a causal effect of co-authorship or network quality
on future output. If anything, the signalling hypothesis is based on a reverse
causality argument, and it receives the most support from our analysis. We do,
however, also find evidence that network connections are most useful to talented
researchers, a result that is consistent with a causal relationship between the flow
to research ideas and future output, except that talent is needed to turn these
ideas into publishable papers. These issues deserve more research.

71

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Figure 2.5: RMSE % Difference across productivity tiers.


Tier 1 (>99%)

Tier 2 (9598%)
Betweenness

Betweenness

Coauthors of Coauth prd.


RMSE % Difference

RMSE % Difference

Coauthors of Coauth prd.

Coauthors productivity

Degree

Degree of order 2

Coauthors productivity

Degree

Degree of order 2

Top 1%

Top 1%

Model 1

Model 1

.5

Tier 3 (9094%)

RMSE % Difference

Coauthors productivity

Degree

Degree of order 2

Coauthors productivity

Degree

Degree of order 2

Top 1%
Model 1

Model 1

.5

1.5

.5

Tier 5 (5079%)

1.5

Tier 6 (<50%)
Betweenness

Betweenness

Coauthors of Coauth prd.


RMSE % Difference

Coauthors of Coauth prd.


RMSE % Difference

Coauthors of Coauth prd.

Top 1%

Coauthors productivity

Degree

Degree of order 2

Top 1%

Coauthors productivity

Degree

Degree of order 2

Top 1%
Model 1

2.5

Betweenness

Coauthors of Coauth prd.

Tier 4 (8089%)

Betweenness

RMSE % Difference

1.5

.5

1.5

Model 1

72

.2

.4

.6

.8

Model 1
Recent past output
Model 2
Degree
Degree of order 2
Giant component
Closeness
Betweenness
Coauthors productivity
Coauthors of Coauthors prod.
Working with a top 1%
-8.27(.00)
-8.39(.00)
-7.53(.00)
-8.00(.00)
-7.75(.00)
-8.41(.00)
-7.94(.00)
-8.04(.00)

Tier 4

-5.11(.00) -7.06(.00) -6.01(.00)


-5.48(.00) -5.88(.00) -6.34(.00)
-4.08(.00) -6.96(.00) -5.32(.00)
-4.37(.00) -6.27(.00) -6.12(.00)
-3.70(.00) -6.57(.00) -5.05(.00)
-4.50(.00) -7.30(.00) -5.51(.00)
-4.61(.00) -7.19(.00) -5.83(.00)
-4.10(.00) -6.07(.00) -4.89(.00)
p-value in parenthesis.

Tier 3
-7.64(.00)

-5.35(.00)

Tier 2
-4.89(.00)

-4.13(.00)

Tier 1

Tier 5

-12.19(.00)
-11.76(.00)
-11.37(.00)
-11.31(.00)
-12.18(.00)
-12.01(.00)
-11.41(.00)
-9.94(.00)

-10.72(.00)

Table 2.6: Diebold-Mariano test across different tiers group

-9.76(.00)
-9.78(.00)
-9.91(.00)
-10.02(.00)
-9.91(.00)
-10.75(.00)
-10.37(.00)
-9.49(.00)

-9.27(.00)

Tier 6

2.7 Concluding remarks

73

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Table 2.7: Prediction accuracy of Model 2. Using non-discounted productivity

R2
.292

RMSE
.886

RMSE Diff.

Coefficients

.800

9.74%

.34

.858
.859
.847
.854
.869
.860
.855
.880
1% level

3.15%
3.08%
4.50%
3.68%
1.99%
3.01%
3.58%
.67%

.13
.05
.15
.11
.44
.62
.18
.23

Model 0
Model 1
Recent past prod.
.421
Model 2
Degree
.334
Degree of order 2
.333
Coauthors productivity
.352
Coauthors of Coauthors prod. .341
Giant component
.320
Closeness
.334
Betweenness
.342
Working with a top 1%
.301

Significant at

Table 2.8: Prediction accuracy of the multivariate models. Using non-discounted


productivity

Model 0
Model 1
Multivariate Model 2
Multivariate Model 3

(1) R2
.292
.421
.380
.443

(2) RMSE (3) RMSE % Diff.


.886
.800
9.74%
.829
6.45%
.785
11.39%

Significant at 1% level. The Diebold-Mariano test of the Multivariate Model 3 is


performed for the RMSE difference between Multivariate Model 3 and Model 1.

Table 2.9: Testing variations on the networks predictive power after changes on
authors affiliations.

Models\Tests
Multivariate Model
Multivariate Model
Multivariate Model
Multivariate Model

2
3
2 excluding Betweenness
3 excluding Betweenness

74

F-test (statistic)
1.77
1.94
1.56
1.66

p-value
.06
.04
.17
.09

2.7 Concluding remarks

Table 2.10: Prediction accuracy of Multivariate Models. Including quadratic


recent output

Model 0
Model 1
Recent output
Recent output
Multivariate
Multivariate
Recent output
Recent output

R2 RMSE
.271
.759
.417
.676
squared
Model 2 .366
Model 3 .442

RMSE % Diff.

Coefficients

10.83%
.55
-.11

.707
.662

6.81%
12.68%
.43
-.08

squared

Significant at 1% level. The Diebold-Mariano test of the Multivariate Model 3 is


performed for the RMSE difference between Multivariate Model 3 and Model 1.

75

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Figure 2.6: Network topology vs. coauthor productivity across life cycle.

RMSE % Difference. Degree Model

.6

.65

.7

10

.75

.8

15

RMSE across Career time. Degree Model

10

12

14
16
Career time

RMSE M0
RMSE M2

18

20

22

RMSE M1
RMSE M3

10

12

14
16
Career time

RMSE % diff M1 M0
RMSE % diff M3 M0

20

22

RMSE % diff M2 M0

RMSE % Difference. Degree 2 Model

.6

.65

.7

10

.75

.8

15

RMSE across Career time. Degree 2 Model

18

10

12

14
16
Career time

RMSE M0
RMSE M2

18

20

22

RMSE M1
RMSE M3

10

12

14
16
Career time

RMSE % diff M1 M0
RMSE % diff M3 M0

20

22

RMSE % diff M2 M0

RMSE % Difference. Coauthors productivity Model

.6

.65

.7

10

.75

.8

15

RMSE across Career time. Coauthors productivity Model

18

10

12

14
16
Career time

RMSE M0
RMSE M2

18

20

22

RMSE M1
RMSE M3

10

12

14
16
Career time

RMSE % diff M1 M0
RMSE % diff M3 M0

76

18

20

RMSE % diff M2 M0

22

77

-9.17(.00) -10.06(.00)
-8.28(.00) -9.15(.00)
-8.40(.00) -8.62(.00)
-8.50(.00) -8.24(.00)
-8.72(.00) -7.74(.00)
-8.31(.00) -7.11(.00)
-7.72(.00) -6.04(.00)
-5.59(.00) -3.57(.00)
-5.65(.00) -4.65(.00)
-4.75(.00) -5.13(.00)
-4.05(.00) -4.12(.00)
-3.30(.00) -2.81(.00)
-4.05(.00) -3.17(.00)
-3.34(.00) -4.11(.00)
-1.26(.21) -2.48(.01)
-1.89(.06) -2.13(.03)
-3.01(.00) -4.04(.00)

-14.56(.00)
-13.24(.00)
-12.42(.00)
-11.38(.00)
-9.59(.00)
-9.61.00)
-7.90(.00)
-5.81(.00)
-4.89(.00)
-5.30(.00)
-4.32(.00)
-2.97(.00)
-2.53(.01)
-3.21(.00)
-1.73(.09)
-1.38(.17)
-3.56(.00)

-3.78(.00)
-3.30(.00)
-3.81(.00)
-3.78(.00)
-4.83(.00)
-4.68(.00)
-4.61(.00)
-1.97(.05)
-3.16(.00)
-2.20(.03)
-2.03(.04)
-1.21(.23)
-2.71(.00)
-1.80(.07)
-0.29(.77)
-1.60(.11)
-3.32(.00)

Degree
-5.36(.00)
-4.57(.00)
-4.38(.00)
-4.01(.00)
-4.54(.00)
-4.11(.00)
-3.22(.00)
-0.36(.72)
-2.73(.00)
-2.82(.00)
-2.22(.03)
-0.81(.42)
-1.28(.19)
-2.68(.01)
-1.02(.31)
-1.31(.19)
3.52(.00)

-8.05(.00)
-7.14(.00)
-6.75(.00)
-5.77(.00)
-4.69(.00)
-4.96(.00)
-3.42(.00)
-1.12(.26)
-1.64(.10)
-2.26(.02)
-1.26(.21)
0.45(.66)
0.26(.79)
-0.84(.40)
-0.64(.52)
-0.72(.47)
3.13(.00)

Model 3
Degree 2 Coauth. prod.

p-value in parenthesis. The Diebold-Mariano test of the Multivariate Model 3 is performed for the RMSE difference
between Multivariate Model 3 and Model 1.

Career time
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Degree

Model 2
Degree 2 Coauth. prod.

Table 2.11: Diebold-Mariano test across career time. Model 2 and Model 3

2.7 Concluding remarks

2. SOCIAL NETWORKS AND RESEARCH OUTPUT

Table 2.12: Predicting Degree

Model
Model
Model
Model

0
1
2
3

R2
.092
.372
.139
.385

RMSE
2.049
1.757
1.992
1.735

RMSE % Diff.
14.25%
2.78%
15.32%

Significant at 1% level. The Diebold-Mariano test of the Model 3 is performed for


the RMSE difference between Model 3 and Model 1.

Table 2.13: Predicting Coauthors Productivity.

Model
Model
Model
Model

0
1
2
3

(1) R2
.185
.308
.275
.367

(2) RMSE
1.749
1.609
1.642
1.536

(3) RMSE % Diff.


7.995%
6.113%
12.17%

Significant at 1% level. The Diebold-Mariano test of the Model 3 is performed for


the RMSE difference between Model 3 and Model 1.

Table 2.14: Prediction accuracy: Long-run future productivity.

R2
Model 0
.235
Model 1
.364
Multivariate Model 2 .325
Multivariate Model 3 .392

RMSE
.735
.668
.693
.656

RMSE % Diff.
9.065%
5.786%
10.72%

Significant at 1% level. The Diebold-Mariano test of the Model 3 is performed for


the RMSE difference between Model 3 and Model 1.

78

2.7 Concluding remarks

Table 2.15: Prediction accuracy: Diversity Models.

Model 0
Model 1
Current stock productivity
Model 2
Neighborhood Diversity
Neighborhood Diversity Squared
Model 3
Neighborhood Diversity
Neighborhood Diversity Squared

R2
.28
.43
.30
.43
-

RMSE RMSE % Diff.


.787
.702
10.84%
.778
.701
-

Coefficients
.33

1.16%
10.95%

1.14
-1.01

.41
-.38

Significant at 1% level. The Diebold-Mariano test of the Model 3 is performed for


the RMSE difference between Model 3 and Model 1.

79

SECOND PART

Chapter 3
Excess Financial Development
and Economic Growth
3.1

Introduction

Financial intermediaries that are better at ameliorating information asymmetries and facilitating transactions exert a positive influence on economic growth
(Levine, Loayza and Beck, 2000). However, recent crises suggest that the excess
of private credit or financial development might harm economic growth under
certain circumstances. Traditional approach focuses on linear relationship between economic growth and financial development, though non-linearities might
be important. Figure 3.1 proposes a plot of the averages of economic growth
rates for 63 developed and developing economies versus the averages of financial
development, measured by the amount of private credit issued by deposit money
banks and other financial institutions to the private sector as a share of GDP,
over 1970-2005.
The relationship looks nonlinear, positive for low and intermediate levels of
financial development, and negative for high levels of financial development.1 Low
levels of financial development characterize mainly low-income countries. The
inverted-U relationship is even more profound if we keep the outliers (removed in
Figure 3.1) - countries characterized by very high level of financial development
and relatively low GDP growth rate - USA, Switzerland and Japan.
1

Quadratic fit of the coefficients for the level and the square of financial development measure
are 0.1589 with std. error 0.0325 and -0.0013 with std. error 0.0003, respectively.

81

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Figure 3.1: Financial development as a determinant of economic growth. Financial Development is measured as the ratio of total private credit to GDP. Economic
growth is the real GDP growth. Data source: WBI and Beck and Demirg
uc-Kunt
(2009).

Korea, Republic of

Malta
Cyprus

Economic Growth
2
4

Thailand
Malaysia
Portugal
Ireland
Greece
Israel
FinlandItaly Spain
Canada
Denmark
United Kingdom

Norway
Germany

Netherlands
Sweden

South Africa

Brazil
Paraguay
Mexico
Chile
Uruguay
Togo
Peru
Senegal
Liberia
Haiti
Venezuela
Ghana

Niger

20

40
60
Financial Development

80

100

In this chapter we re-examine the effect of financial development on economic


growth. Analyzing the state of real and financial progress during the last four
decades leads us to the following hypothesis: financial crises may occur because
of over-developed financial system. This hypothesis is tested empirically, given
the data on economic growth and indicators of financial development of the country. First, we re-estimate the effect of financial development on economic growth
using the traditional specifications implemented in the literature and an updated
dataset. We find that financial development has a negative effect on economic
growth for the period 1970-2005. Second, we consider the growth in the industrial
sector (consisting of manufacture and energy), and the growth in the financial
sector (consisting of financial services), as two forces that jointly determine the

82

3.1 Introduction

economic growth of the country. We suggest that for smooth economic development, balanced technological progress in both the productive and the financial
sectors of the economy is necessary. Technological progress in the real sector
expands the production capacities of the economy, while growth in the financial
sector allows to efficiently use these new capacities. We define excess financial
development as a measure of the difference between the output growth of these
two sectors that leads to a decline in total output. We concentrate on the real
sector of the economy because it is considered to be the main predictor of productivity growth and thus of technological progress of the country (see Kaldor,
1967, and followers). The services sector, regardless its growing weight in total
output does not add predictive power to our estimated model.2
The results suggest that there exist a critical level of financial development,
after which the effect of financial development on economic growth is negative,
both in the short and long run. In particular, in the short-run, when the growth
of the financial sector is not accompanied by the real sector growth and the
difference between both growth rates is higher than 4.45%, the effect of financial
development on economic growth becomes negative. These results are robust to
the specification of the excess financial development variable.
Up to now, the empirical literature did not concentrate on the possible negative effect of excess financial development on economic growth. Traditional
empirical approach suggested comparison of economic and financial conditions
among the countries characterized by different stages of their economic development. The works by Levine, Loayza and Beck (2000), Aghion, Howitt and
Mayer-Foulkes (2005), Michalopoulos, Laeven and Levine (2009) are examples
of studies that concluded that financial development facilitates economic growth
and countries convergence. These results seem to be robust to different estimation technics (see for example, Oguzoglu and Stengos, 2008, Dabos and Williams,
2009). Moreover, Aghion, Howitt and Mayer-Foulkes (2005) build a theory and
find evidence that the effect of financial development on economic growth vanishes
as the country approaches the technological frontier. However, the authors do not
2

See Appendix C2 for a description of the results.

83

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

consider the possibility of economys overshooting due to the excess supply of financial funds, which would lead to the negative effect of financial development on
economic growth. More related to our work, Arcand, Berkes and Panizza (2011)
reevaluate the results of Levine, Loayza and Beck (2000) allowing for nonlinear relation between financial development and growth. They interpret the availability
of excess finance in the economy as a result of expectations of bailout.
Theoretical justification of the ideas discussed in the chapter may rely on the
theory of informational overshooting, as considered by Rob (1991), Zeira (1994),
and Zeira (1999). The economy grows, together with its financial system, as
long as it does not reach its production capacity limit. Given that this limit
is unknown, rational agents continuously learn about it. The expectations are
optimistic until, at some point in time, the economy hits the limit. Technological
progress in the real sector allows to expand the capacity limit. We suggest, that
the occurrence of the financial crises may be prevented as long as the pace of
progress in the real sector is higher than the pace of progress in the financial
sector. Alternative theoretical justification of the existence of excess finance may
rely on negative externalities resulted from over-developed financial system.3
The existing literature proposes a number of other justifications of cautious
attitude towards fast development of financial sector. The idea of informational
overshooting by Zeira (1994) is used by Biais, Rochet and Woolley (2009) to
explain financial crises. The authors show how uncertain strength of innovation
leads to growing confidence and rents as long as the innovation bound is not
reached. If the innovation is fragile, the economy goes into crisis. Barbarino and
Jovanovic (2007) provide a microfoundation for market crashes based on informational overshooting. Given uncertain demand, the economy grows together with
optimism until demand outstrips the capacity. They provide an explanation of
the dot-com crisis of 2000-2001. Similar idea is developed in Wang (2007). Gennaioli, Shleifer and Vishny (2010) demonstrate how the financial services may be
excessive when there are certain unlikely risks faced by the investors. When the
risks are recognized, investors switch from the risky securities, and the markets
3

For related research on the topic, see for example, Philippon (2010) and Bruno, Rochet and
Woolley (2009).

84

3.1 Introduction

become fragile because of the excessive volume of the new claims. Finally, Santomero and Seater (2000) derive the optimal size of the financial sector evaluating
the trade-off between the costs of maintaining this sector and the benefits of improved efficiency because of monitoring the production process by the financial
sector.
We should stress the differences and similarities between this chapter and two
recent closely related articles: the work by Arcand, Berkes and Panizza (2011)
and the work by Dabos and Williams (2009). Similar to Dabos and Williams
(2009), we re-evaluate the effect of financial development on economic growth
using the same dataset as Levine, Loayza and Beck (2000), estimating the model
by two-step system GMM, correcting the standard errors as described in Windmeijer (2005) and applying a reduced set of instruments. We find that financial
development measure is a significant and positive determinant of economic growth
for the period 1961-1995. In addition to that, our work estimates the effect of
financial development on economic growth in an updated dataset. We use a set
of variables analogous to those used by Levine, Loayza and Beck (2000), but extended to the period 1970-2005. Once we re-estimate our model on this extended
dataset using a two-step system GMM estimation, with corrected standard errors and with a reduced set of instruments, the effect of financial development on
economic growth becomes negative. Consistent with previous findings, when we
focus on the period before 1990s, the updated dataset reports positive effect of
financial development measures on economic growth.
Similar to Arcand, Berkes and Panizza (2011), we analyze the non-monotonic
relationship between economic growth and financial development. Our results
are consistent with their findings, that is, we obtain a non-linear relationship
between financial development and economic growth; too much finance might
lead to a reduction of economic growth. However, the rest of the coefficients in
such regressions are insignificant, and the validity of the instruments is doubtful as
it is shown by the Sargan test statistics.4 Therefore, we test a deeper relationship
4

See tables in Appendix C1, or the results reported in Table 3 of Arcand, Berkes and Panizza,
2011.

85

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

between finance and growth, based on the relative rates of development of the
real and financial sectors of the economy.

3.2

Financial Development and Growth

The aim of this section is to re-evaluate the effect of financial development on economic growth, using the traditional measures of financial development and econometric tools allowing to obtain robust and consistent estimates. We re-estimate
the results of Table 5 in Levine, Loayza and Beck (2000) using a reduced set
of instruments and implementing the Windmeijer (2005) small-sample correction
for the two-step standard errors, without which those standard errors tend to be
severely downward biased (Roodman, 2006). We find evidence of not always positive influence of financial development on economic growth. We suggest that the
effect of financial development on economic growth might depend on the growth
and characteristics of the other sectors.
Our basic hypothesis states, that in order to have positive influence on economic growth, countrys financial development must be accompanied by corresponding technological development in other productive sectors of the economy.
The next section proposes several tests of this hypothesis.
In the rest of this section we briefly describe the methodology, applied in
Levine, Loayza and Beck (2000) and related studies to evaluate the effect of
financial development on economic growth. Then, we briefly review the data
used in the estimation and present our estimation results.

3.2.1

Methodology

Similar to Levine, Loayza and Beck (2000), Dabos and Williams (2009) and
Arcand, Berkes and Panizza (2011), we use the System Generalized-Method-ofMoments (GMM) estimator developed for dynamic models panel data by Arellano

86

3.2 Financial Development and Growth

and Bover (1995) and augmented by Blundell and Bond (1998). Following Levine,
Loayza and Beck (2000), the regression equation considered is:
0

yi,t yi,t1 = ( 1)yi,t1 + Xi,t + i + i,t ,

(3.1)

where y is the logarithm of real per capita GDP, X represents the set of explanatory variables, is an unobserved country-specific effect, is the error term, and
the subscripts i and t represent country and time period, respectively. We can
rewrite equation (3.1) as,
0

yi,t = yi,t1 + Xi,t + i + i,t ,

(3.2)

Then, we eliminate the country-specific effect taking first-differences of equation (3.2):


0

yi,t yi,t1 = (yi,t1 yi,t2 ) + (Xi,t Xi,t1 ) + (i,t i,t1 ).

(3.3)

Under this specification the explanatory variables are endogenous because


of feedback from growth to its determinants, or because of the common effects
of omitted variables on both growth and its explanatory variables or perhaps
due to measurement error of some proxy variables. Moreover, in equation (3.3)
by construction the new error term, i,t i,t1 , is correlated with the lagged
dependent variable, yi,t1 yi,t2 . Under the assumptions that the error term, ,
is not serially correlated, and the explanatory variables, X, are weakly exogenous
(uncorrelated with future realizations of the error term), the GMM dynamic panel
estimator uses lags of explanatory variables in levels as instruments to solve the
endogeneity problem under the following moment conditions:
E[yi,ts (i,t i,t1 )] = 0 f or s 2, t = 3, ..T,
E[Xi,ts (i,t i,t1 )] = 0 f or s 2, t = 3, ..T.

(3.4)

We apply system GMM estimator that combines the difference estimator defined above with a levels estimator. The inclusion of an equation of the variables

87

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

in levels allows us to use information of differences among countries that comes


purely from the cross-section part of the sample.
The levels equation uses lags of differences of explanatory variables as instruments. These are appropriate instruments under the following additional assumption: although there may be correlation between the levels of the right-hand side
variables and the country-specific effect in equation (3.2), there is no correlation
between the differences of these variables and the country-specific effect:5

E[(yi,t1 yi,t2 )(i + i,t )] = 0,


E[(Xi,t1 Xi,t2 )(i + i,t )] = 0.

(3.5)

A GMM procedure is employed to generate consistent and efficient parameter


estimates. We provide the necessary checks of the consistency of this estimator.
In particular, we use Sargan and Hansen tests of over-identifying restrictions to
test the exogeneity of the instruments; we check the validity of the assumption
that there is no serial correlation between error terms; we deal with the small
sample standard errors bias by applying the correction suggested by Windmeijer
(2005); and we collapse the instruments in blocks, as too many instruments can
overfit endogenous variables and fail to expunge their endogenous components
(Roodman, 2006).6

3.2.2

Data

Firstly, we use the same dataset as Levine, Loayza and Beck (2000) to replicate
their results, correcting for the standard errors bias and for the possible overfitting
due to a large number of instruments. This dataset consists of non-overlapping
averages over the five years periods from 1961 to 1995 for 74 countries. 5-year averages are used in empirical growth models to smoothen out the cyclical patterns
of the data. The dependent variable is economic growth per capita. The set of
5

The same moment conditions were used by Levine, Loayza and Beck (2000). For theoretical
justification, see Arellano and Bover (1995).
6
We use the Stata module xtabond2, developed by Roodman (2006).

88

3.2 Financial Development and Growth

control variables includes: the level of real GDP per capita in the beginning of
each five-year period (Initial GDP), measure of openness of the country (Trade),
defined as a sum of real exports and imports as share of real GDP; inflation
rate (Inflation), defined as a log difference of Consumer Price Index; government
expenditures to GDP (Gov. size); the proxy for human capital (Schooling), measured as average years of secondary schooling in the population over 15; and
black market premium (Black mkt premium), defined as a ratio of black market
exchange rate and official exchange rate minus one. The main variables of interest
are the commonly used proxies for financial development:
i) Private credit, defined as the value of credits issued by financial intermediaries to the private sector divided by GDP;
ii) Bank credit, defined as the credit issued by deposit money banks to the
private sector divided by GDP;
iii) Liquid liabilities of the financial system (currency plus demand and interestbearing liabilities of banks and non-bank financial intermediaries) divided by
GDP.
Secondly, we estimate the same specification as in Levine, Loayza and Beck
(2000), but using the new dataset covering the time period 1971-2005. For this
purpose, we consider the same measures of financial development defined above
by i)-iii), from the dataset constructed by Beck and Demirg
uc-Kunt (2009). We
use analogous control variables, extracted from the World Bank Development
indicators, except for the proxy for human capital (Tertiary), which in this second
case is measured as enrollment in tertiary education7 .
The summary statistics and the data sources used in the estimations are
reported in Appendix C.

3.2.3

Estimation Results

Table 3.1 presents the estimation results based on the dataset used in Levine,
Loayza and Beck (2000) for the time period 1961-1995. Column 1 shows the
results using private credit as the proxy for financial development. Column 2
7

We were not able to find the human capital proxy used by Levine, Loayza and Beck (2000)
for the years after 1995. The results of estimation are robust to this change of variable.

89

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

presents the results using bank credit as the financial variable and column 3
show the results using liquid liabilities as a proxy for financial development. The
Windmeijer (2005) standard errors correction is implemented, we also collapse
the instruments. Period-specific dummies are included to account for specific
time trends.
Table 3.1: Financial Intermediation and Growth: Re-estimation of the System
GMM estimates of Levine, Loayza and Beck (2000) for time period 1961-1995
Variables/ System GMM Models:
Initial GDP
Government Size
Trade
Inflation
Schooling
Black Market Premium
Private Credit

Private Credit
(1)
.475
(1.142)
.421
(1.624)
.625
(1.209)
1.619
(2.745)
1.515
(3.491)
1.474
(.811)
2.167
(.949)

Bank Credit
(2)
.379
(1.194)
.167
(1.665)
.507
(1.262)
1.131
(2.439)
1.163
(3.605)
1.509
(.821)

Liquid Liabilities

Year Dummies
Number of Instruments
Number of Countries
Number of observations
AB-test for AR(2) (p-value)
Hansen J-test (p-value)
Sargan test (p-value)

.172
(1.228)
.782
(1.896)
1.529
(1.212)
3.758
(2.913)
4.266
(3.879)
3.039
(1.227)

1.833
(.841)

Bank Credit

Constant

Liquid Liabilities
(3)

.733
(8.031)

.357
(8.414)

Y ES
21
74
439
.906
.569
.028

Y ES
21
74
439
.994
.340
.006

4.338
(1.358)
13.697
(11.436)
Y ES
21
74
439
.973
.709
.385

a.
For comparison purpose with Levine, Loayza and Beck (2000) all variables are taken in natural logs with the
exception of Inflation, Schooling and the Black market premium, whose transformation is log(variable+1). The
instruments employed in the estimation are: the lags of the variables from t-4 to t-2, the first difference of the
variables lagged one period, the year dummies and the first difference of the year dummies. Significant at
5% level, Significant at 10% level.

The financial development measures have a positive and significant effect on

90

3.3 Financial Development, Real Sector, and Growth

economic growth, whereas the black market premium appear to be negatively


associated with GDP growth. The rest of the variables are insignificant.
Table 3.2 presents the estimation results based on the updated dataset for the
time period 1971-2005. Under this sample, the effects of financial development on
economic growth becomes negative regardless of the financial development measure considered. As expected, human capital has a positive impact on economic
growth. The rest of the variables have the expected signs. Note that we do not
include the black market premium in the second estimation, since this measure
disappeared in the middle of 1990s. The results are robust to alternative proxies
for human capital.
Comparison of tables 3.1 and 3.2 suggests that once we consider a more recent
period, financial development seems to harm economic growth. Interestingly,
the effect of financial development on economic growth reported in Table 3.2
becomes positive when we restrict the updated dataset to the period before 1995.
Among the possible reasons of such a change, we could consider a slowdown
of technological progress during the 1990s. If this was a case, further financial
development could have a negative influence on economic growth, according to
our hypothesis. We propose more discussion on this issue in the next section.

3.3

Financial Development, Real Sector, and Growth

This section analyzes the empirical facts related to the main hypothesis under
consideration: is the harmonized development of both the financial sector and
the technological possibilities of the country needed for the financial development
to have unambiguously positive effect on economic growth?
In attempt to answer this question we re-evaluate the effect of the financial
development on economic growth controlling for the technological development of
the country. As a proxy of technological development we consider the industrial
output growth, the unit labor cost growth, and the labor productivity growth in
the industrial sector.
As it has been pointed out by Reinhart and Rogoff (2008) and documented
in Kaminsky and Reinhart (1999), the majority of historical crises are preceded

91

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Table 3.2: Financial Intermediation and Growth: System GMM estimates for the
time period 1970-2005
Variables/ System GMM Models:
Initial GDP
Government Size
Trade
Inflation
Schooling
Private Credit

Private Credit
(1)
.648
(1.285)
4.106
(2.837)
.573
(1.582)
.932
(.905)
3.646
(1.967)
1.957
(.903)

Bank Credit
(2)
.744
(1.337)
3.687
(3.043)
1.582
(2.817)
1.274
(.603)
3.492
(1.838)

Liquid Liabilities

Year Dummies
Number of Instruments
Number of Countries
Number of observations
AB-test for AR(2) (p-value)
Hansen test (p-value)
Sargan test (p-value)

.054
(1.289)
6.532
(3.363)
1.839
(2.974)
1.415
(.920)
3.819
(1.985)

1.680
(.728)

Bank Credit

Constant

Liquid Liabilities
(3)

16.258
(20.351)

4.611
(16.921)

Y ES
31
82
367
.360
.228
.295

Y ES
31
82
367
.232
.303
.470

4.726
(1.638)
1.831
(15.717)
Y ES
31
82
367
.398
.185
.659

b.
For comparison purpose with Levine, Loayza and Beck (2000) all variables are taken in natural logs with the
exception of Inflation and Schooling whose transformation is log(variable+1). The instruments employed in
the estimation are: the lags of the variables from t-4 to t-2, the first difference of the variables lagged one
period, the year dummies and the first difference of the year dummies. Significant at 1% level,
Significant at 5% level, Significant at 10% level.

by financial liberalization. Financial liberalization has taken place in the United


States before the financial crisis of 2007. New unregulated, or lightly regulated,
financial entities have come to play a much larger role in the financial system,
undoubtedly enhancing stability against some kinds of shocks, but possibly increasing vulnerabilities against others. Technological progress has plowed ahead,
shaving the cost of transacting in financial markets and broadening the menu of
instruments(Reinhart and Rogoff, 2008). These authors analyze the similarities

92

3.3 Financial Development, Real Sector, and Growth

of the most severe financial crises, among which they define the big 5crises
episodes in Spain (1977), Norway (1987), Finland (1991), Sweden (1991) and
Japan (1992).
First, to obtain some circumstantial evidence, we intend to analyze the relative development of the financial and the real sector preceding the severe crises
episodes in the economies studied by Reinhart and Rogoff (2008). In particular, we look at the financial development, defined as a growth rate of the value
of credits by financial intermediaries to the private sector divided by GDP four
periods before and after the crisis occurred, for the big 5 criseseconomies. We
also look at the growth rate of labor productivity in the real (industrial) sectors
of these economies prior and just after the crises episodes.
Figure 3.2 presents the averages of these measures of financial and technological development, as well as averages of their differences and average economic
growth rates for the following big 5financial crises: Norway (1987), Finland
(1991), Sweden (1991), Japan (1992), and the USA (2007).8 Period t in the figure represent the year of crisis, and t i, t + i -years before and after the crisis,
respectively.9
As we can observe from Figure 3.2, the amount of private credit increases significantly 1-2 years before the severe crisis episode, while the labor productivity
of the industrial sector of the economy grows at a significantly lower rate during
two years preceding the financial crisis. Considering the two factors together, the
difference between financial development and what we call here industrial development seems to augment prior to the financial crises episodes. This demonstrates
that the financial development on average was faster than real sector development
before the five biggest financial crises defined by Reinhart and Rogoff (2008).
Figure 3.3 shows the industrial and financial output growth averages across 33
OECD countries - all of them except Estonia - for the time period 1970-2010. We
want to test whether non-synchronized financial and industrial growth (captured
in the figure by significant differences between the two time-series) leads to lower
economic growth.
8
9

We do not have data for the Spanish crisis in 1977.


The plots of individual countries time-series (non-averaged) may be found in appendix C4

93

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Figure 3.2: Financial and technological development before and after the Big
5crises

Figures 3.4 focuses on the period of the last crisis, and plots the data from
2000 to 2010 for the biggest economies in Europe and United States. Note that
the gap in growth of output in the financial and industrial sectors is augmented
around year 2005 for the US, prior to the financial crisis of 2007.
Further, we try to explore the effect of relative development of the financial
and real sectors on economic growth using the panel data for a set of countries.
We try to capture both the short and the long run effects. For the long run estimation we use pooled data across the time period 1970-2005 for 63 developed and
developing countries. For the short run estimates, we consider a panel data of 33
OECD economies, averages over five-years non-overlapping periods for the time
period 1970-2005. Our main proxy for technological development is the industrial
output growth. We also consider as determinants of economic growth the difference of the growth rate in the financial and industrial sectors and a quadratic
term of this difference, which takes into account potential non-linearities.

94

3.3 Financial Development, Real Sector, and Growth

Figure 3.3: Industrial and Financial growth output. Average across countries.
Data Source: OECD.

In addition, we include in our analysis the industrial output growth as a control variable, as we want to consider variations of the difference of the growth
rates in the two sectors not determined by the industrial sector growth. Industrial growth is one of the main predictors of economic growth (Kaldor, 1967).
However, there is still an important variation of economic growth which remains
unexplained. We focus on explaining part of this variation through the financial
sector.10
After describing our data, we estimate the impact of excess financial development on economic growth in the long and short run using cross-section analysis
and System GMM, respectively. Then we carry out some robustness check using other proxy variables for excess financial development. The results provide
strong evidence in favor of our general hypothesis: excess financial development
10

In appendix C3 we present the estimations that include not only the industrial sector
growth, but also a sector of services as a proxy for technological progress. Our conclusions do
not change.

95

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Figure 3.4: Growth rates during the last ten years.

may harm economic growth.

3.3.1

Data

In our empirical analysis we use a panel-data on 33 OECD countries - all of


them with the exception of Estonia - over the period 1970-2010, taken from the
OECD database, World Bank, and Levine, Loayza and Beck (2000).11 For a cross
section estimation we use averaged data for 63 countries over the period 19702010. To measure excess financial development, we construct four indicators
of differences between the financial and industrial sector:
i) Difference between the financial and industrial output growth.
ii) Difference between the private credit divided by GDP and industry output
divided by GDP.
iii) Difference between the financial and industrial unit labor cost growth.
11

Unfortunately, we have no data on industrial output growth or industrial productivity for


non-OECD countries.

96

3.3 Financial Development, Real Sector, and Growth

iv) Difference between the financial and industrial unit labor productivity
growth.
The first two indicators are our main measures of excess financial development. The intuition behind the choice of the explanatory variables lies in our
understanding of the sources of economic growth of the country. For steady economic development, according to our hypothesis, the balanced co-development of
the financial and real sector is required. Development (technological progress) in
the real sector insures growth of economys productive possibilities, and precludes
the economy from going into recessions (caused, for example, by the presence of
capacity limits). Financial development is necessary for the economic growth of
the country as it allows the growing capacities of the economy to be fully utilized.
However, whenever the latter exceeds the former, the productive capacity limits
may be reached at some point in time, causing economic downturn.
The difference between the financial output growth and the real output growth
are computed using data from the OECD. The financial output is measured as
the GDP produced by financial intermediation, real estate, renting and business
activities. The real output is obtained as the GDP produced in industry including
energy. The difference in the growth of both sectors and its quadratic term
partially capture the effect of excess financial development on growth.
The second indicator of excess financial development is obtained as the difference between the value of credits by financial intermediaries to the private
sector, divided by GDP (the measure from Levine, Loayza and Beck, 2000) and
the real output divided by GDP (Industry share of GDP). The value of credits by
financial intermediaries to the private sector is the preferred measure of Levine,
Loayza and Beck. This measure is also used by, for example, Aghion, Howitt and
Mayer-Foulkes (2005), Oguzoglu and Stengos (2008) and Dabos and Williams
(2009). Industrial output divided by GDP is used as a corresponding measure
of technological development, because its units of measurement are compatible
with the considered financial development indicator.
The third and fourth indicators of financial development are closely related
measures of productivity growth in the two sectors. They are taken from OECD

97

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

dataset and are used for robustness check. We considered both labor productivities per hour worked and per employee. Both of these measures are highly
correlated with the industrial output growth and with the unit labor cost in industry. The labor productivity of the financial sector, on the contrary, does not
vary a lot for the time period considered, and it is not strongly correlated with
other measures of financial development, such as the amount of private credit or
financial output.12
We support the claim that financial development contributes to growth, as
found in Levine, Loayza and Beck (2000). However, when there is excess financial development, that is, when the difference between financial and industrial
development is very high, the effect of finance on growth may become negative.
To account for this non-linear relationship, we always include a quadratic term
of the excess financial variable in our regressions.
Table 3.3 shows summary statistics of the main variables used in the estimation equations. Column 1 shows the mean of the variables using the 33 countries
and the 40 years of our panel, from 1970 to 2010. Column 2 shows the standard
deviation of each variable and column 3 the correlation between each variable
and real GDP growth per capita.
Observe the high volatility of all the variables and the negative correlation
between the difference variables and real GDP growth per capita.

3.3.2

Estimation Results

First, we examine the relationship between excess financial development and


growth using a pure cross-sectional estimator. Next, we use GMM dynamic panel
procedures that more comprehensively confront problems induced by countryspecific effects and endogeneity.
3.3.2.1

Impact of excess financial development on long-run economic


growth: Cross-sectional analysis

Following Levine, Loayza and Beck (2000), we consider not only OECD countries
but also developing countries in the cross section analysis. Here we focus only
12

A detailed description of the data and its sources are presented in Appendix C1.

98

3.3 Financial Development, Real Sector, and Growth

Table 3.3: Summary Statistics of the main variables


Variables/Statistics

Mean

St.d.

Corr.

Real Growth Per capita


Industrial Output Growth
Financial Output Growth
Diff. between Financial and Industrial Growth
Industrial Labor Cost Growth
Financial Labor Cost Growth
Difference in Labor Cost Growth Rates
Labor Productivity Growth in Industry
Labor Productivity Growth in Finance
Difference in Labor Productivities
Private Credit Share of GDP
Industry Share of GDP
Difference in Pr. Credit and Industry Shares

2.282
2.999
4.234
1.235
6.204
8.705
3.126
3.760
.295
-3.465
63.195
25.071
39.031

3.116
5.266
4.445
5.884
13.126
12.813
6.960
4.211
4.781
6.147
35.831
5.287
37.556

1.000
.745
.409
-.329
-.137
.070
.286
.463
.027
-.300
-.077
.123
-.094

c.
The sample consists of 33 countries across 40 years, from 1970 to 2010. Column 3 shows the correlation
between each variable and real GDP growth per capita.

on the difference between private credit and industry share of GDP as a measure
of excess finance, as the other indicators of excess finance are only available for
OECD countries.
The pure cross-sectional analysis uses data averaged over 1970-2005, therefore,
there is one observation per country. The basic regression takes the form:

GROW T Hi = +1 DIF Fi +2 DIF Fi2 +3 Si +[CON DIT ION IN GSET ]i +i ,


where the dependent variable, GROW T Hi equals real per capita GDP growth,
DIF Fi is the difference between private credit and industry share. We include a
quadratic term of this variable as the expected relationship between excess financial development and growth is not linear. As we want to focus on excess financial
development, we control for the industry share, thus, the DIFF variable is capturing variations in the difference between the two sectors due to variations in
the private credit. CON DIT ION IN GSET represents a vector of conditioning
information that controls for other factors associated with economic growth. The
conditioning information set includes the constant, the logarithm of initial GDP,

99

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

a proxy for human capital, government size, inflation and openness to international trade. The control variables are the same as the ones used in the section
above. The initial income variable is used to capture the convergence effect and
school attainment is used to control for the level of human capital. Government
size and inflation captures macroeconomic stability.
To examine whether cross-country variations in the exogenous component
of excess financial development explains cross-country variations in the rate of
economic growth, the legal origin indicators are used as instrumental variables
for the excess financial development, DIF Fi and its square. This cross section
analysis estimates the structural long-run equilibrium of the model assuming
homogeneity over the 63 countries.
Our method of estimation is the two-steps generalized method of moments
(GMM). In our estimation we only use linear moment conditions, which require
the instrumental variables - legal origin variables - to be orthogonal to the error
term, i . In the context of the cross-sectional growth regressions, the moment
conditions mean that legal origin may affect per capita GDP growth only through
the excess financial development variable, DIF Fi . We test this condition using
the Hansen J-statistic. Our instruments have been intensively used in the literature to capture the exogenous effect of financial development on growth.13
We confirmed through F-test that the instruments are relevant, that is, they are
enough correlated with the troublesome variable, DIF Fi .
Table 3.4 presents the results from the cross-section analysis. Column 1 shows
the results without controlling for the industrial share, in column 2 we control
for the industrial share so that the excess financial variable captures difference in
the two sectors due to variations in the amount of private credit.
The results show that excess private credit decreases economic growth in the
long-run. In particular, the optimal rate of financial development is achieved
when the private credit to GDP is 70% higher than the industry output share of
GDP. When we do not include the industrial share, the optimal difference between
13

For example, in Aghion, Howitt and Mayer-Foulkes (2005), and Dabos and Williams (2009).
See Levine, Loayza and Beck (2000) for more details on the legal origin variables and its
relationship with financial development.

100

3.3 Financial Development, Real Sector, and Growth

Table 3.4: Long-Run Effects of Excess Financial Development on Growth


Economic Growth
Excess Finance
(Excess Finance)2
Log Trade
Schooling
Log Inflation
Log Initial GDP
Log Government Size
Industry Share
Constant
Number of observations
F-test (p-value)
Hansen J-test (p-value)

(1)
6.457
(3.035)
4.008()
(2.455)
.367
(.236)
.494
(.275)
2.813
(2.249)
1.148
(.225)
.046
(.335)

9.930
(2.123)
63
5.760(.000)
.002(.964)

(2)
7.294
(3.416)
5.060
(3.047)
2.206
(.235)
.268
(.353)
2.299
(2.206)
1.192
(.235)
.097
(.336)
7.252
(3.169)
8.181
(2.439)
63
6.370(.000)
.020(.887)

d.
Excess financial is defined as the difference between Private credit/GDP and Industry value added/GDP.
Industry share is the Industry value added/GDP. Significant at 1% level, Significant at 5% level,
Significant at 10% level. () Significant at 10.3% level.

financial and industrial output is higher (about 80%). However, without controlling for the industry share, the increase in the difference between the financial and
industrial share variables may be due to changes in industrial output, without
corresponding changes in the financial sector. The rest of control variables have
the expected signs, though most of them are not statistically significant.
3.3.2.2

Impact of excess financial development on short-run economic


growth: System GMM

For our panel estimation, we follows the strategy outlined in the previous section.
We use the System Generalized-Method-of-Moments (GMM) estimator developed
for dynamic models of panel data introduced by Arellano and Bover (1995) and
augmented by Blundell and Bond (1998).

101

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Our panel consists of data for 33 OECD countries over the period 1970-2010.14
We average data over non-overlapping, five-year periods, so there are seven observations per country (1971-75; 1976-80; 1981-85; etc.). The initial GDP and initial
level of educational attainment correspond to the first year of each observation
interval.
Table 3.5 shows the results of the estimation of the effect of excess financial
development on economic growth. This table presents the results of the System
GMM estimator using the difference between the financial and the industrial
output growth as a measure of excess financial development.15 We use the twostep GMM estimation with the standard error correction proposed by Windmeijer
(2005). We also collapsethe instruments to avoid overfit of the endogenous
variables due to the use of too many instruments (the rule of thumb is to use
a number of instruments smaller or equal to the number of groups). We also
include period-specific dummies, which apart from their usual role of capturing
deterministic trends in the data, serve as exogenous instruments.
The results show that financial development has a positive effect on growth in
the short-run, which is in accordance with the existing literature. However, when
the difference between the growth of the financial and industrial output is higher
than 4.45%, the effect of financial development on growth becomes negative. This
is consistent with our hypothesis of excess financial development.
Next, we examine if our results are robust to the specification of our main variable of interest, excess financial development. Given that financial innovation is
not observable, we need to use proxy variables that capture financial innovations
which are not an outcome of industrial innovations. In our main analysis, we
assume that the difference in output growth rates between the financial and industrial sector captures the excess financial development. However, other factors
such as difference in relative prices or unit labor costs, difference in the productivity of labor, or difference between the private credit share and industry share
could be capturing excess financial development as well.
14

We do not have data on the openness to trade and schooling for the year 2010.
Table ?? in the appendix C3 shows the results using as excess financial development the
difference between financial and industrial plus service output growth.
15

102

3.3 Financial Development, Real Sector, and Growth

Table 3.5: System GMM. Excess financial development and growth


Economic Growth

System GMM
.211
(.115)
0.024
(.010)
.607
(.106)
.793
(.337)
1.479
(1.702)
3.387
(4.890)
.354
(.623)
.766
(1.506)
.035
(.056)

Excess Finance
(Excess Finance)2
Industry growth
Log Trade
Schooling
Log Inflation
Log Initial GDP
Log Government Size
Constant
Year Dummies
Number of Instruments
Number of Countries
Number of observations
AB-test for AR(2) (p-value)
Hansen test (p-value)
Sargan test (p-value)

Y ES
23
33
166
.456
.663
.907

e.
Excess financial is defined as the difference between financial output growth and industrial output growth. The
instruments employed in the estimation are the variables lagged two periods, the difference of the variables
lagged one period, the year dummies and the first difference of the year dummies. The standard error
correction proposed by Windmeijer (2005) is implemented. Significant at 1% level, Significant at 5%
level, Significant at 10% level.

Table 3.6 shows the effect of excess financial development on growth using
other proxy variables for excess finance. Column 1 presents the results using the
difference between the growth rates of the unit labor cost of the financial and
industrial sector as a measure of excess financial development. Column 2 shows
the results using the difference between the financial and industrial productivity
of labor units as a proxy for excess financial development. Column 3 shows the
results using the difference between the private credit share to GDP and industry
share to GDP as a measure of excess financial development.
When we use the difference between the unit labor cost in the two sectors

103

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Table 3.6: System GMM. Excess financial development and growth. Using other
proxies for excess financial
Variables/Models:
Excess Finance
(Excess Finance)2
Industry Labor Cost Growth

Labor Cost
(1)
.081
(.171)
.031()
(.019)
.147
(.122)

Productivity
(2)
.036
(.267)
.014
(.035)

1.331
(1.761)
.702
(2.864)
3.613
(7.389)
.851
(1.061)
2.255
(2.358)
11.259
(16.512)

.352
(17.866)
.007
(2.125)
2.138
(5.267)
2.683
(6.597)
.122
(1.558)
4.312
(4.689)
13.081
(12.202)

Y ES
23
31
141
.396
.308
.156

Y ES
23
25
143
.076
.551
.263

Industry Share

Schooling
Log Inflation
Log Initial GDP
Log Government Size
Constant
Year Dummies
Number of Instruments
Number of Countries
Number of observations
AB-test for AR(2) (p-value)
Hansen test (p-value)
Sargan test (p-value)

3.727
(3.327)
4.306
(2.273)

.188
(.178)

Industry Productivity Growth

Log Trade

Credit-Industry Share
(3)

3.048
(1.098)
.497
(3.124)
16.793
(15.143)
2.210
(1.002)
.127
(2.462)
13.825
(7.775)
Y ES
23
31
140
.113
.314
.103

f.
In column 1 excess finance is defined as the difference between the growth rates of the unit labor cost of
financial and industrial sectors. Column 2 presents results using as a proxy for financial development the
differences between the financial and industrial productivity of labor units. Column 3 shows the results using
the difference between the private credit share to GDP and industry share to GDP as a measure of excess
financial development. The instruments employed in the estimation are the variables lagged two periods, the
difference of the variables lagged one period, the year dummies and the first difference of the year dummies.
Significant at 1% level, Significant at 5% level, Significant at 10% level,() Significant at 12% level.

as a proxy for excess financial development, the negative effect of excess finance,
given by the square of the difference, is only significant at the 12% of significance
level. The effect of financial development on growth becomes negative when the
growth of the unit labor cost in the financial sector exceeds the growth of the unit

104

3.4 Discussion

labor cost in the industrial sector by 1.33%. On the other hand, when we use the
difference in productivity of labor unit in the two sectors, none of the variables
are significant, although they have the correct signs. Finally, when we use the
difference between private credit share to GDP and industry share to GDP, we
obtain that financial development has a negative impact on growth when the
private credit to GDP is 43.3% higher than the industry share to GDP. That is,
the private credit share in the economy should not exceed the industrial output
share by more than 43.3%, otherwise, the excess of credit might reduce economic
growth. Non-significance of the other factors affecting growth may be due to the
small number of observations available for the analysis.

3.4

Discussion

In this section we briefly discuss possible theoretical justification of the existence


of excess financial development.
The justification may come from the theory of informational overshooting introduced by Rob (1991) and Zeira (1994), and applied latter by Zeira (1999) to
explain credit crushes. According to this theory, the economy has some unknown
production capacity limit. The limit may be due to the bounded technology,
bounded demand, or scarce resources. Rational agents use all the available information to form expectations about the capacity limit of the economy. As long as
the limit has not been reached, the expectations about it become more and more
optimistic. Finally, the expectations are so high that the economy overshoots
above its capacity limit: the resources invested in production are too large in
comparison to the production possibilities. The expectations, investment, and
economic activity falls at this point, causing severe economic distress.
We propose to consider technological progress of the country as the source of
growth of economys capacity. Indeed, introduction of new technologies, invention
of new goods and materials serves as a substitute for such scarce production
factors as, for example, natural resources and labor force. Without technological
progress economies would stagnate at the levels of development defined by their
resource capacities.

105

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

On the other hand, we propose to consider financial development as a factor facilitating economic activities. As technological possibilities of the economy grow,
the demand for financial services increases (see, for example, Aghion, Howitt and
Mayer-Foulkes (2005) for explanation based on a Schumpeterian model on economic growth). Thus, financial development is a crucial determinant of economic
growth. However, when new financial technologies are introduced at a faster rate
than new production technologies, the speed with which the economy approaches
its capacity increases. Therefore, too fast financial development may finally lead
to lower economic growth, by increasing the probability of economic overshooting. Note, that this framework does not imply any market frictions, except of the
lack of information.
The idea of market crash based on the informational overshooting has been
implemented by several authors, among them Barbarino and Jovanovic (2007)
and Bruno, Rochet, and Woolley (2009).
Another justification of the ideas tested in the previous section could rely on
the presence of negative externalities from the financial sector operations.

3.5

Conclusions

We analyze the effect of financial development on economic growth. Our analysis


based on three different panels: i) of 33 OECD countries over the period 19702010, ii) of 74 countries used by Levine, Loayza and Beck (2000) over the period
1961 - 1995 and iii) of 82 developed an developing countries over the period 1971
- 2005 reveals the following:
First, financial development measured as private credit to GDP have a positive
effect on economic growth over the years 1960 to 1995. However, when we use
an extended panel for the years 1970 - 2005, the effect of financial development
on economic growth turns to negative. A plausible explanation is the slowdown
of technological progress during the 1990s. This together with a sharp increase
in the private credit to GDP could generate an excess of financial development
leading to a negative impact on economic growth.

106

3.5 Conclusions

Second, there is a non-linear relationship between financial development and


economic growth. In particular, the effect of financial development on economic
growth is maximum when the private credit to GDP is around 122%. This result
is consistent with Arcand, Berkes and Panizza (2011).
Third, when the financial development is not accompanied by technological
development (reflected in industrial output growth), financial development might
have a negative impact on economic growth, both in the short and long run. In
particular, in the short-run, when the difference between growth rates is higher
than 4.45% (or when the private credit to GDP exceeds the industrial output to
GDP by more than 43%), the effect of financial development on economic growth
becomes negative.
Our results should be important for policy makers. When the private credit
to GDP exceeds the industrial share by more than 43%, governments should
implement policies aimed to reduce the amount of credit as the economy has
likely reached its capacity limit. The same happens when the financial output
growth exceeds the industrial output growth by more than 4.45%. Otherwise,
the excess financial development will slowdown the economic growth and might
even lead to severe financial crisis.
In further research we plan to explain the existence of the upper bound on
the optimal level of financial development by the limited productive capacities of
the economy, or by the negative externalities produced by financial system.

107

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

3.6

Appendix

C1.Description of the Data Variables


Private Credit: credit by deposit money banks and other financial institutions
to the private sector as a share of GDP, adjusted for inflation. Source: Levine,
Loayza Beck (2000), Beck and Demirg
uc-Kunt(2009).
Bank Credit: credit by deposit money banks to the private sector as a share
of GDP, adjusted for inflation. Source: Levine, Loayza and Beck (2000), Beck
and Demirg
uc-Kunt(2009).
Liquid liabilities: liquid liabilities as a share of GDP, adjusted for inflation.
Source: Levine,Loyayza and Beck (2000), Beck and Demirg
uc-Kunt(2009).
Initial GDP : real per capita GDP. Source: World Development Indicators.
Economic growth: real per capita GDP growth rate. Source: World Development Indicators.
Government size: government expenditure as share of GDP. Source: World
Development Indicators.
Openness to trade: sum of real exports and imports as share of real GDP.
Source: World Development Indicators.
Inflation rate: percentage change of CPI index. Source: World Development
Indicators.
Black market premium: ratio of black market exchange rate and official exchange rate minus one. Source: Levine, Loayza and Beck (2000).
Legal origin: dummy variables for British, French, German and Scandinavian
legal origin. Source: Levine, Loayza and Beck (2000).
Tertiary: enrollment in tertiary education. Source: World Development Indicators.

108

3.6 Appendix

Schooling: average years of secondary schooling in the population over 15,


Source: Levine,Loayza and Beck (2000).
Industry share: industry value added as a share of GDP. Source: OECD
dataset, World Bank Development Indicators.
Labour Productivity in Industry (C E): labour productivity per hour: gross
value added in constant prices per hour worked in national currency, annual
growth rate. Source: OECD dataset.
Labour Productivity in Finance (J K): labour productivity per hour: gross
value added in constant prices per hour worked in national currency, annual
growth rate. Source: OECD dataset.
Industry Labor cost: unit labour cost in Industry (C E), annual growth rate.
Source: OECD dataset.
Finance Labor cost: unit labour cost in Financial and Business services (J K),
annual growth rate. Source: OECD dataset.
C2.Summary Statistics
Table 3.7: Summary statistics for dataset of 74 Countries, years 1960-1995
Variables/Statistics

Mean

St.d.

Min

Max

Real Growth Per Capita


Initial GDP
Government Size
Trade
Inflation
Schooling
Black Market Premium
Private Credit
Bank Credit
Liquid Liabilities

1.768
3746.429
14.820
59.981
.156
4.327
.677
36.712
28.598
42.450

2.939
4716.518
5.959
40.716
.321
2.820
5.424
32.457
23.971
28.175

-10.000
108
4
9
-.03
.04
-.05
0
0
5

-11.000
20131
45
315
3.5
12
110
206
166
191

C3.More Robustness check


Re-estimation of the results in Levine, Loayza and Beck (2000)

109

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Table 3.8: Summary statistics for dataset of 82 Countries, years 1970-2005


Variables/Statistics

Mean

St.d.

Min

Max

Real Growth Per Capita


Initial GDP
Government Size
Trade
Inflation
Tertiary
Black Market Premium
Private Credit
Bank Credit
Liquid Liabilities

1.878
6748.813
15.961
70.297
.295
38.081
.258
48.114
43.144
49.473

3.233
8701.090
6.004
38.539
1.619
22.056
.880
37.249
37.249
35.984

-19
66
4
8
.005
1
-.05
2
7
7

20
47064
47
309
13
92
13
203
193
399

Table 3.9 presents the re-estimation of the results in Levine, Loayza and Beck
(2000) including a quadratic term of financial development. We can observe the
non-linear relationship between financial development and economic growth.
Another measure of technological progress
Table 3.10 presents the estimation results when the services sector is added to
the measure of technological development of the country. In this case, the excess
financial development is defined as the difference between financial and industry
plus services output growth - that is, the proxy for technological progress consists
of the industry plus service output growth -. Column (1) presents the results
controlling for schooling and using data from 1970 to 2005. Column (2) shows
the results excluding the proxy for human capital but extending the panel till
2010. We observe that our results (presented in percentage points) are robust to
the inclusion of the service sector. Obviously, the industry plus service growth
has a higher explanatory power and, thus, there is less uncertainty about output
growth in the augmented model.
A4.Growth series during the Big 5 financial crises
Figures 3.5, 3.6, 3.7 show the real GDP growth rate, the real private credit
to GDP growth rate and the real industry productivity growth rate for Norway,
Finland, Sweden, Japan and USA during several preceding and subsequent years
to their main financial crises (as defined by Reinhart and Rogoff, 2008). Note
that during the years preceding the financial crisis, private credit to GDP and
industrial productivity growth rates are not synchronized, the amount of private

110

3.6 Appendix

Table 3.9: Re-estimation of the results in Levine, Loayza and Beck (2009) with
quadratic term of financial development
Variables/ System GMM Models:
Initial GDP
Government Size
Trade
Inflation
Schooling
Black Market Premium
Private Credit
(Private Credit)2

Private Credit
(1)
.484
(1.609)
.301
(1.724)
.272
(1.135)
.356
(1.211)
.784
(3.581)
1.496
(.760)
8.053
(3.176)
3.293
(1.059)

Bank Credit
(2)
.288
(2.024)
.471
(1.709)
.588
(1.376)
.335
(1.248)
1.276
(4.948)
1.584
(.799)

(Bank Credit)2
Liquid Liabilities
(Liquid Liabilities)2

Year Dummies
Number of Instruments
Number of Countries
Number of observations
AB-test for AR(2) (p-value)
Hansen J-test (p-value)
Sargan test (p-value)

.345
(1.078)
.935
(1.794)
1.027
(.953)
.511
(1.135)
3.546
(2.654)
2.229
(.947)

11.236
(4.967)
6.206
(2.961)

Bank Credit

Constant

Liquid Liabilities
(3)

4.123
(11.402)

2.144
(13.468)

Y ES
23
74
441
.845
.437
.036

Y ES
23
74
441
.845
.364
.016

12.502
(4.368)
5.950
(2.814)
3.999
(8.748)
Y ES
23
74
441
.820
.548
.481

g.
For comparison purpose with Levine, Loayza and Beck (2000) all variables are taken in natural logs with the
exception of Inflation, Schooling and the Black market premium, whose transformation is log(variable+1). The
instruments employed in the estimation are: the lags of the variables from t-4 to t-2, the first difference of the
variables lagged one period, the year dummies and the first difference of the year dummies. Significant at
5% level, Significant at 10% level.

credit is growing at faster rate while the growth in industrial productivity slows
down. This is consistent with the excess financial development hypothesis.

111

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Table 3.10: System GMM. Excess financial development and growth, using another proxy for the excess financial development
Variables/ Data sets:

1970-2005
(1)

Excess Finance
(Excess Finance)2
Industry plus Service Growth
Log Trade
Schooling
Log Inflation
Log Initial GDP
Government Size
Constant
Year Dummies
Number of Instruments
Number of Countries
Number of observations
AB-test for AR(2) (p-value)
Hansen J-test (p-value)
Sargan test (p-value)

1970-2010
(2)

.234
(.127)
.026
(.014)
1.002
(.203)
.193
(.646)
.731
(1.601)
.410
(.548)
.572
(1.609)
1.178
(.819)
2.294
(11.450)

.332
(.149)
.038
(.016)
.877
(.267)
.687
(.532)

.379
(.364)
.879
(.671)
1.454
(.738)
2.610
(7.786)

Y ES
31
32
159
.929
.550
.340

Y ES
29
32
201
0.364
.530
.952

h.
The Excess financial development variable is the difference between financial and industrial plus service output
growth. The instruments employed in the estimation are: the lags of the variables from t-3 to t-2, the first
difference of the variables lagged one period, the year dummies and the first difference of the year dummies.
Significant at 5% level, Significant at 10% level.

112

3.6 Appendix

Figure 3.5: Economic growth during the Big 5financial crises

Figure 3.6: Industrial productivity growth during the Big 5financial crises

113

3. EXCESS FINANCIAL DEVELOPMENT AND ECONOMIC


GROWTH

Figure 3.7: Private credit to GDP growth during the Big 5financial crises

114

References
[1] Acedo, F. J., Barroso, C., Casanueva, C., & Galan, J. L. (2006). Coauthorshiop in management and organizational studies: An empirical and
network analysis. Journal of Management Studies, 43, 957983.
[2] Aghion Ph., P. Howitt and D. Mayer-Foulkes, 2005. The Effect Of Financial
Development On Convergence: Theory And Evidence. Quarterly Journal of
Economics, 120 (1): 173222.
[3] Akerlof, G. (1970), The Market for Lemons: Quality Uncertainty and the
Market Mechanism. Quarterly Journal of Economics, 84, 3, 488-500.
[4] Arcand J. L., E. Berkes and U. Panizza, 2011. Too Much Finance? Working
Paper.
[5] Arellano, M., Bond, S.R., (1991). Some specification tests for panel data:
Monte Carlo evidence and an application to employment equations. Review
of Economic Studies 58, 277298.
[6] Arellano, M., and O. Bover, 1995. Another look at the instrumental variable
estimation of errorcomponents models. Journal of Econometrics, 68 (1): 29
51.
[7] Ashley, R., C.W.J. Granger, and R. Schmalensee, (1980), Advertising and
Aggregate Consumption: An Analysis of Causality, Econometrica, 48, 11491167.
[8] Azoulay, P., J. Graff Zivin, J. Wang (2010), Superstar Extinction, Quarterly
Journal of Economics, 125, 2, 549589.

115

REFERENCES

[9] Baerveldt, C. M.A.J. Van Duijn, L. Vermeij, and D.A. Van Hemert (2004).
Ethnic boundaries and personal choice: Assessing the influence of individual
inclinations to choose intra-ethnic relationships on pupils networks. Social
Networks, 26: 5574.
[10] Bala V. and Goyal S. (1998). Learning from neighbours. Review of Economic
Studies, 65, 595621.
[11] Barbarino A. and B. Jovanovich, 2007. Shakeouts and Market Crashes. International Economic Review, 48 (2): 385420.
[12] Baum, Christopher F., Schaffer, Mark E. and Stillman, S. (2003). Instrumental variables and GMM: Estimation and testing. Stata Journal, 3(1),
131.
[13] Beck Th. and A. Demirg
uc-Kunt, 2009. Financial Institutions and Markets
Across Countries and over Time: Data and Analysis. World Bank Policy
Research Working Paper No. 4943.
[14] Belmaker, J., Cooper, N., Lee, T.M. and Wilman, H. (2010). Specialization
and the road to academic success. Frontiers in Ecology and the Environment,
Volume 8 (10).
[15] Bertrand, M., E.F.P. Luttmer, and S. Mullainathan (2000), Network effects
and welfare cultures, Quarterly Journal of Economics, 115, 3, 1019-1055.
[16] Biais Br., J.-Ch. Rochet and P. Woolley, 2009. Rents, Learning and Risk in
the Financial Sector and Other Innovative Industries. Working Paper.
[17] Blundell, R. and S. Bond, 1998. Initial conditions and moment restrictions
in dynamic panel data models. Journal of Econometrics, 87 (1): 115143.
[18] Bonacich, P. (1987). Power and centrality: a family of measures. American
Journal of Sociology 92(5): 11701182.
[19] Boucher V., Bramoulle, Y., H. Djebbari, and B. Fortin (2010), Do Peers Affect Student Achievement? Evidence from Canada Using Group Size Variation. Mimeo

116

REFERENCES

[20] Box, G. E. P., and D. R. Cox (1964), An analysis of transformations, Journal


of the Royal Statistical Society, Series B 26: 211252.
[21] Bramoulle, Y., H. Djebbari, and B. Fortin (2009), Identification of peer
effects through social networks. Journal of Econometrics, 150, 1, 4155.
[22] Bramoulle, Y., Kranton R. (2007). Publics goods in network. Journal of
Economic Theory, 135:1, 478494
[23] Brock, W. and S. Durlauf (2001), Interaction-based Models, in J. Heckman
and E. Leamer (eds), Handbook of Econometrics, Volume 5, Amsterdam:
North Holland.
[24] Brown, L. D. (2005). The importance of circulating and presenting
manuscripts: Evidence from the accounting literature. Accounting Review,
80, 5583.
[25] Calvo-Armengol, A., Jackson M.O. (2004). The effects of social networks on
employment and inequality.American Economic Review, 94(3): 42654
[26] Calvo-Armengol, A., E. Patacchini and Y. Zenou (2009), Peer effects and
social networks in education, Review of Economic Studies, 76, 1239- 1267.
[27] Cameron, A. C., and P. K. Trivedi, Microeconometrics: Methods and Applications (Cambridge: Cambridge University Press, 2005).
[28] Chung, K. H., Cox, R. A. K. and Kim K.A. (2009). On the relation between
intellectual collaboration and intellectual output: Evidence from the finance
academe. The Quarterly Review of Economics and Finance, 49, 893-916.
[29] Conley, T. and C. Udry (2008), Learning about a new technology: Pineapple
in Ghana, Economic Growth Center, Yale University.
[30] Conti, G., Galeotti, A., Mueller, G.,and Pudney S. (2009). Popularity. Unpublished manuscript

117

REFERENCES

[31] Dabos M. and T. Williams, 2009. A reevaluation of the Impact of Financial Development on Economic Growth and its sources by regions. Working
Paper, Universidad de Belgrano.
[32] Defazio, D., Lockett, A., & Wright, M. (2009). Funding incentives, collaborative dynamics and scientic productivity: Evidence from the european
framework program. Research Policy, 38 (2), 293-305.
[33] Diebold, F.X. and Mariano, R.S., (1995), Comparing predictive accuracy,
Journal of Business and Economic Statistics, 13, 253-3.
[34] Ductor, L. , Fafchamps, M. , Goyal, S. and Van der Leij, M. J. (2011). Social
networks and research output. Mimeo
[35] Duflo, E. and E. Saez (2003), The role of information and social interactions in retirement plan decisions: Evidence from a randomized experiment,
Quarterly Journal of Economics, 118, 3, 815-842.
[36] Fafchamps, M. (2004), Market institutions in Sub-Saharan Africa. MIT
Press, Cambridge, Mass.
[37] Fafchamps, M., S. Goyal, and M. J. van der Leij (2010), Matching and
Network Effects, Journal of European Economic Association, 8, 1, 203-231.
[38] Fair, R.C., Shiller, R., (1990), Comparing information in forecasts from
econometric models. American Economic Review 80, 375389.
[39] Fildes, R., & Makridakis, S. (1995). The impact of empirical accuracy studies
on time series analysis and forecasting. International Statistical Review 63,
289308.
[40] Fong, E., and W.W. Isajiw (2000). Determinants of friendships choices in
multiethnic society. Sociological Forum, 15(2): 249272.
[41] Gennaioli N., A. Shleifer and R. W. Vishny, 2010. Neglected Risks, Financial
Innovation, and Financial Fragility. NBER Working Paper No. 16068.

118

REFERENCES

[42] Glaeser, E. J. Scheinkman (2002), Non-market interactions. Econometric


Congress: advances in economic theory and econometrics. edited by M. Dewatripont, L.P Hansen and S. Turnovsky. Cambridge University Press.
[43] Goyal, S. (2011). Social networks in economics. Sage Handbook of Social
Network Analysis, edited by J. Scott and P. Carrington SAGE Publications
Ltd.
[44] Goyal, S. (2007), Connections: an introduction to the economics of networks.
Princeton University Press, Princeton, New Jersey.
[45] Goyal, S., Van der Leij, Marco.J., and Moraga, J.L. (2006). Economics: An
emerging small world. Journal of Political Economy, 114, 2, 403412.
[46] Granger, C.W.J., (1980), Testing for Causality: A Personal Viewpoint, Journal of Economic Dynamics and Control, 2, 329-352.
[47] Granovetter, M. (1985), Economic Action and Social Structure: The Problem of Embeddedness, American Journal of Sociology, 91, 3, 481-510.
[48] Hackett EJ. (2005). Essential tensions: identity, control, and risk in research.
Social Studies of Science, 35: 787-826.
[49] Hollis A. (2001). Co-authorship and the output of academic economists.
Labour Economics, 8, 503530.
[50] Hong, Y., Lee, T., (2003), Inference on predictability of foreign exchange
rates via generalized spectrum and nonlinear time series models. Review of
Economics and Statistics 85, 10481062.
[51] Hudson, J. (1996). Trends in multi-authored papers in economics. Journal
of Economic Perspectives, 10, 153158.
[52] Inoue, A., Kilian, L., (2004), In-sample or out-of-sample tests of predictability? which one should we use? Econometric Reviews 23, 371402.
[53] Jackson, M. O. (2008). Social and Economic Networks. Princeton University
Press, Princeton, New Jersey.

119

REFERENCES

[54] Jackson, M. O. and Asher Wolinsky (1996). A Strategic Model of Economic


and Social Networks. Journal of Economic Theory, 71, 4474.
[55] Kalaitzidakis, P., T. Mamuneas, and T. Stengos (2003), Rankings of academic journWorld als and institutions in economics, Journal of European
Economic Association, 1, 6, 1346-1366.
[56] Kaldor N., 1967. Strategic Factors in Economic Development, New York,
Ithaca.
[57] Kennedy, P. (2008).A guide to econometrics. 6th Edition. Blackwell Publishing.
[58] Kodrzycki, Y., and Yu, P. D. (2006). New Approaches to Ranking Economics
Journals. Contributions to Economic Analysis & Policy, Vol. 5: Iss. 1, Article
24.
[59] Kuha, J. (2004), AIC and BIC: Comparisons of assumptions and performance. Sociological Methods & Research, 33, 188-229.
[60] Laban, D. N., & Tollison, R. D. (2000). Intellectual collaboration. Journal
of Political Economy, 108, 632662.
[61] Leahey, E. (2006). Gender differences in productivity: research specialization
as a missing link. Gender & Society, 20, 754-780.
[62] Lee, S. and Bozeman, B. (2005). The impact of research collaboration on
scientific productivity. Social Studies of Science 35(5), 673702.
[63] Lettau, M., Ludvigson, S., (2001), Consumption, aggregate wealth, and expected stock returns. Journal of Finance 3, 815849.
[64] Levine R., N. Loayza and Th. Beck, 2000. Financial Intermediation and
Growth: Causality and Causes. Journal of Monetary Economics, 46 (1):
3177.

120

REFERENCES

[65] Lin, X. (2010). Identifying peer effects in student academic achievement by


a spatial autoregressive model with group unobservables. Journal of Labor
Economics 28, 825860.
[66] Liu, X., Patacchini, E., Zenou, Y. and L-F. Lee (2012). Criminal networks:
Who is the key player?. Mimeo
[67] Manski, C. (1993), The Identification of endogenous social effects: the reflection problem, Review of Economic Studies, 60, 3, 531-542.
[68] Mayer, A. and S.L. Puller (2008), The old boy (and girl) network: Social
network formation on university campuses, Journal of Public Economics, 92,
1-2, 329-347.
[69] McDowell, J.M. and M. Melvin (1983), The determinants of co-authorship:
An analysis of the economics literature, Review of Economics and Statistics,
65, 1, 155-160.
[70] McPherson, M., L. Smith-Loving, and J.M. Cook (2001). Birds of Feather:
Homophily in Social Networks. Annual Review of Sociology, 27: 41544.
[71] Medoff, M. H. (2003). Collaboration and the quality of economics research.
Labour Economics, 10, 597608.
[72] Melin, G., & Persson, O. (1996). Studying research collaboration using coauthorships. Scientometrics, 36(3), 363377.
[73] Michalopoulos St., L. Laeven and R. Levine, 2009. Financial Innovation and
Endogenous Growth, NBER working paper series, Working Paper 15356.
[74] Mihaly, K. (2009). Do More Friends Mean Better Grades? Student Popularity and Academic Achievement. RAND Working Paper Series WR-678.
[75] Moffitt R. (2001), Policy interventions, low-level equilibria, and social interactions, in: S. Durlauf and P. Young (eds.), Social Dynamics, Cambridge:
MIT Press.

121

REFERENCES

[76] Montgomery, J. (1991), Social networks and labor-market outcomes: toward


an economic analysis, American Economic Review, 81, 5, 1408-1418.
[77] Moody, J. (2001). Race, school integration, and friendship segregation in
America. American Journal of Sociology, 107(3): 679-716.
[78] Munshi, K. (2003), Networks in the modern economy: Mexican migrants in
the U.S. labor market, Quarterly Journal of Economics, 118, 2, 549-599.
[79] Oguzoglu U. and Th. Stengos, 2008. Can Dynamic Panel Data Explain the
Finance-Growth Link? An Empirical Likelihood Approach. Working Paper.
[80] Pesaran, M.H., Timmermann, A., (1995), Predictabilityof stock returns: robustness and economic significance. Journal of Finance 50, 12011228.
[81] Potscher, B.M., (1991), Effects of model selection on inference, Econometric
Theory 7, 163-185.
[82] Presser, S., (1980). Collaboration and the quality of research. Social Studies
of Science 10, 95-101.
[83] Rapach, D.E., Wohar, M.E., (2002), Testing the monetary model of exchange rate determination: new evidence from a century of data. Journal of
International Economics 58, 359385.
[84] Reinhart, C. M., and K. S. Rogoff. 2008. Is the 2007 US Sub-prime Financial Crisis So Different? An International Historical Comparison. American
Economic Review, 98 (2), 33944.
[85] Rob R., 1991. Learning and Capacity Expansion under Demand Uncertainty.
The Review of Economic Studies, 58 (4), 655-675.
[86] Santomero A. M., and J. J. Seater, 2000. Is there an optimal size for the
financial sector? Journal of Banking and Finance, 24 (6), 945-965.
[87] Sauer, Raymond D., (1988). Estimates of the Returns to Quality and Coauthorship in Economic Academia, Journal of Political Economy, 96(4), pp.
855-66.

122

REFERENCES

[88] Singh, J., (2007). External collaboration, social networks and knowledge
creation: Evidence from scientific publications. In Danish Research Unit of
Industrial Dynamics Summer Conference 2007, Denmark, 2007.
[89] Stock, J.H., Watson, M.W., 1999. Forecasting inflation. Journal of Monetary
Economics, 44, 293?5.
[90] Sullivan, R., Timmermann, A., White, H., (1999). Data-snooping, technical
trading rule performance, and the bootstrap. Journal of Finance 54, 1647
1691.
[91] Swanson, N.R., White, H., (1997). A model-selection approach to real-time
macroeconomic forecasting using linear models and artificial neural networks.
The Review of Economics and Statistics 79, 265275.
[92] Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting, 16, 437 450.
[93] Topa, G. (2001). Social Interactions, Local Spillovers, and Unemployment,
Review of Economic Studies, 68(2): 261-95
[94] van der Leij, M. J. (2006). The Economics of Networks: Theory and Empiric.
Tinbergen Institute Research Series No. 384. Amsterdam: Thela Thesis.
[95] van der Leij, M.J. and Goyal, S. (2011). Strong Ties in a Small World, Review
of Network Economics, 10, 2, 1.
[96] Waldinger, F. (2010). Quality Matters: The explusion of students and
the consequences for Phd student outcomes.Journal of Political Economy,
118(4), 787-831].
[97] Wang Zhu (2007). Technological innovation and market turbulence: The
dot-com experience. Review of Economic Dynamics, 10 (1), 78105.
[98] Wasserman, S. and K. Faust (1994), Social Network Analysis. Cambridge
University Press.

123

REFERENCES

[99] Windmeijer, F., 2005. A finite sample correction for the variance of linear
efficient two-step GMM estimators. Journal of Econometrics, 126 (1), 2551.
[100] Zeira J., 1994. Informational Cycles. The Review of Economic Studies, 61
(1): 3144.
[101] Zeira J., 1999. Informational overshooting, booms, and crashes.Journal of
Monetary Economics, 43 (1): 237257.
[102] Zuckerman, Harriet, and Merton, Robert K. (1973). Age, Aging, and Age
Structure in Science. In The Sociology of Science: Theoretical and Empirical
Investigations, by Robert K. Merton. Chicago: Univ. Chicago Press.

124

Reunido el Tribunal que suscribe en el da de la fecha acord otorgar, por


Tesis Doctoral de Don Lorenzo Ductor Gmez la calificacin de

a la

Alicante 22 de Mayo de 2012

El Secretario,
El Presidente,

UNIVERSIDAD DE ALICANTE
CEDIP

La presente Tesis de D. ________________________________________________ ha sido


registrada con el n ____________ del registro de entrada correspondiente.
Alicante ___ de __________ de _____

El Encargado del Registro,

La defensa de la tesis doctoral realizada por D/D Lorenzo Ductor Gmez se ha realizado en
las siguientes lenguas: Ingls y castellano, lo que unido al cumplimiento del resto de requisitos
establecidos en la Normativa propia de la UA le otorga la mencin de Doctor Europeo.
Alicante, 22 de 05 de 2012
EL SECRETARIO

EL PRESIDENTE

Вам также может понравиться