Вы находитесь на странице: 1из 17

BUSC

Data Analytics
Kevin Choi
November 20, 2014

Abstract:
The data competition asked two things from its participants: choose a particular
stage of the linear model of innovation1, and advise the 2Massachusetts government
on how to foster innovation. The purpose of my analysis is two-fold. It tends, firstly,
to briefly discuss why Ive chosen the development stage of the innovation model,
and secondly, to discuss my recommendations to the Massachusetts government in
areas of utility patents that all aim to foster innovation.

Introduction:
My reasons for choosing the development stage of the linear model of innovation
are mainly because of its historical consistency and technological relevance. By and
large, the development stage in the linear model of innovation has shown consistent
support from academics, researchers, and economists. Below is a brief table that
illustrates the progress in the linear model of innovation.
Authors
J. Huxley (1934)
J.D. Bernal (1939)
V. Bush (1945)

Model
background, basic, adhoc, development
pure (and fundamental), applied
basic, applied

Bowman (in Bush, 1945)

pure, background, applied and development

U.S. PSRB3 (1947)

fundamental, background, applied,


development

The Linear Model of Innovation: The Historical Construction of an Analytical Framework by

Your job is to advise the state government of Massachusetts by describing and explaining how
well the linear model of innovation can be used as a guide for policy aimed at fostering
innovation. - competition problem statement
3

Presidents Scientific Research Board

Canadian DRS4 (1947)

fundamental, background, applied,


development

R. N. Anthony

uncommitted, applied, development

U.S. NSF (1953)

basic, applied, development

British DSIR5 (1958)

basic, applied and development,


prototype

OECD (1962)

fundamental, applied, development

Note the number of times development is included in these early models. In fact,
when economists began to play a larger role in the linear model of innovation they
consistently kept the standard model intact (basic, applied, and development) as a
foundation for future economic theories and models. Economists had agreed6 on the
standard three categories to analyze industrial research, and have kept
development as an important part of their model. Below is another table that
illustrates how economists have improved and re-defined the linear model of
innovation, and kept development as an important part of their model.

Authors

Model

Mees (1920)

Pure science, development, manufacturing

Schumpeter (1939)

Invention, innovation, imitation

Stevens (1941)

Fundamental research, applied research, test-tube or bench


research, pilot plant, production (improvement, trouble

Bichowsky (1942)

Research, engineering (or development), factory (or production)

Furnas (1948)

Exploratory and fundamental research, applied research,


development, production

Mees and
Leermakers (1950)

Research, development (establishment of small-scale use, pilot


plant and models, adoption in manufacturing)

Department of Reconstruction and Supply


Department of Scientific and Industrial Research
6
They finally settled on the conventional taxonomy, using the standard three categories to
analyze industrial research and using numbers on R&D for measuring the contribution of science to economic
progress (Godin 2004). - Godin, Linear Model of Innovation
5

Brozen (1951a)

Invention, innovation, imitation

Brozen (1951b)

Research, engineering development, production, service

Maclaurin (1953)

Pure science, invention, innovation, finance, acceptance

Ruttan (1959)

Invention, innovation, technological change

Ames (1961)

Research, invention, development, innovation

Scherer (1965)

Invention, entrepreneurship, investment, development

Schmookler (1966)

Research, development, invention

Mansfield (1968)

Invention, diffusion, innovation

Myers and Marquis


(1969)

Problem solving, solution, utilization, diffusion

Utterback (1974)

Generation of an idea, problem-solving or development,


implementation, and diffusion

By the 1960s, the Organization Economic Co-operation and Development (OECD)


formally defined development as, The use of the results of fundamental and applied
research directed to the introduction of useful materials, devices, products, systems,
and processes or the improvement of existing ones. The mention of useful
materials, devices, products, systems, processes or the improvement of existing
ones has allowed me to concentrate my analysis on statistics that capture utility
and innovation. Therefore, my analysis focuses primarily on utility patents to
measure and evaluate innovation.

Analysis:
My analysis begins with exterior data from the 7Bloomberg Innovation Ranking
System. The Bloomberg system annually ranks the top twenty states in the United
States for innovation with a score. The data was collected from the Bureau of
Economic Analysis, Bureau of Labor Statistics, National Science Foundation, U.S.

7

http://www.bloomberg.com/visual-data/best-and-worst/most-innovative-in-u-dot-s-states

Census, and the U.S. Patent and Trademark Office. Below is a brief innovation profile
of Massachusetts according to Bloombergs ranking system.

Massachusetts
(ranked third in the country)
STEM professionals as a percentage of state population:
3.44%
Science and tech degree holders as a percentage of state
population:
11.84%
Utility patents granted as a percentage of U.S. total:
4.74%
State government R&D spending as a percentage of U.S. total:
0.35%
Gross state product per employed person:
$110,325
Three-year change in productivity:
3.39%
Public tech companies as a percentage of all public companies
based in the state:
29.19%

The Bloomberg dataset is particularly useful because it holds unique information


about utility patents. To measure and evaluate innovation, I wanted to focus on
utility patents, and understand their role in innovation. Firstly, utility patents seem
important to innovation because they are intellectual property. In fact, according to
the World Intellectual Property Organization, Intellectual property (IP) refers to
creations of the mind, such as inventions; literary and artistic works; designs; and
symbols, names and images used in commerce. IP is protected in law by, for
example, patents, copyright, and trademarks, which enable people to earn
recognition or financial benefit from what they invent or create. By striking the right
balance between interests of innovations and the wider public interest, the IP
system aims to foster an environment in which creativity and innovation can
flourish. The word innovation is mentioned in the definition several times, which

provides me enough confidence to pursue utility patents as an appropriate indicator


for innovation. Secondly, there is a startling amount of states with both high
innovation scores and utility patents on the Bloomberg innovation system. Below is
the utility patent distribution organized from the highest scores on the left.

Seeing that utility patents are seemingly important to innovation, I decided to build
a linear model using data collected from Bloomberg. Below is a table of the list of
variables that I used to study to innovation and utility patents.

Name of variable
STEM professionals as a percentage of
state population
Science and tech degree holders as a
percentage of state population

Model name for variables


stem

Utility patents granted as a percentage


of U.S. total

utilitypatents

State government R&D spending as a


percentage of U.S. total

govrd

scienceprof

Gross state product per employed


person

gsp

Three-year change in productivity

threeyearproduct

Public tech companies as a percentage


of all public companies based in the
state:

pubtechcomp

The Statistical Analysis of the Bloomberg Innovation


Ranking System
I began with exploring the Bloomberg dataset using scatterplots. These exploratory
data analysis techniques are useful for understanding the data, and the visual
graphs are helpful for identifying any apparent trends. Nothing unordinary stood
out in the scatterplot except several linear relationships between the variables and
the innovation score.
pairs(bloombergdata)



I included the initial model below without any transformations or variable selection.
The initial models summary statistics are shown below, and the model has a high
adjusted R square value. However, there is reason to believe that the variance of the
model is non-constant, and there are signs of heteroskedasticity. I conducted the
Breusch-Pagan (or Cook-Weisberg) test to check for non-constant variance. The
initial model fails the test, and the model has non-constant variance.

summary(model <- lm(score ~ stem + scienceprof + utilitypatents + govrd
+ threeyearproduct + pubtechcomp + gsp))
##
## Call:
## lm(formula = score ~ stem + scienceprof + utilitypatents + govrd +
## threeyearproduct + pubtechcomp + gsp)
##
##
## Coefficients:

## Estimate Std. Error t value Pr(>|t|)


## (Intercept) -1.123e+01 5.500e+00 -2.042 0.047824 *
## stem 5.704e+00 1.824e+00 3.128 0.003278 **
## scienceprof 3.146e+00 6.371e-01 4.938 1.44e-05 ***
## utilitypatents 2.677e+00 7.250e-01 3.692 0.000664 ***
## govrd 1.082e+00 3.852e-01 2.809 0.007652 **
## threeyearproduct 1.399e+00 3.146e-01 4.447 6.77e-05 ***
## pubtechcomp 5.664e-01 1.163e-01 4.870 1.79e-05 ***
## gsp 1.139e-04 6.236e-05 1.826 0.075313 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.821 on 40 degrees of freedom
## Multiple R-squared: 0.9077, Adjusted R-squared: 0.8916
## F-statistic: 56.21 on 7 and 40 DF, p-value: < 2.2e-16
ncvTest(model)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 4.806927 Df = 1 p = 0.02834556


The initial model needed to be altered in some way to make OLS and regression
assumptions. Therefore, I chose to transform the response variable (score) using a
combination of Box Cox transformation and maximum likelihood estimation.

lmbd <- boxcox(model, data = bloombergdata, lambda = seq(-2, 2), main =
"Transform Score", xlab = "lambda", ylab = "log-likelihood")

lambda <- lmbd$x[which.max(lmbd$y)]


trans.score <- (score^lambda - 1) / lambda


The new transformed model is shown below. It seems to pass the non-constant
variance test with a p-value equivalent to 0.61398. Moreover, the p-value means it is
highly likely that it will fail to reject the hypothesis of constant variance.

ncvTest(model.transformed)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.2544199 Df = 1 p = 0.61398


I wanted to also check which variables were statistically insignificant using formal
variable selection methods. The variable selection I used below is the Akaike
Information Criteria (AIC) and backwards stepwise.

null <- lm(trans.score ~ 1)
step.backward <- step(model.transformed, scope = list(lower = model.tra
nsformed, upper = null), direction = "backward")
## Start: AIC=175.14
## trans.score ~ stem + scienceprof + utilitypatents + govrd + pubtechc
omp + gsp
model.final <- step.backward


The final model is shown below after the Box-Cox transformation on the response
variable score, variable selection using backward stepwise, and testing for
heteroskedasticity using the Cook-Weisberg test. Also, notice that in the final model
the variable threeyearproduct was dropped. Threeyearproduct had been weakest of
the variables. More importantly, utility patents are statistically significant in the
model.




print(summary(model.final))
##
## Call:
## lm(formula = trans.score ~ stem + scienceprof + utilitypatents +
## govrd + pubtechcomp + gsp)
##

## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.626e+00 5.459e+00 -1.214 0.231795
## stem 4.687e+00 1.816e+00 2.581 0.013527 *
## scienceprof 2.661e+00 6.340e-01 4.197 0.000141 ***
## utilitypatents 2.898e+00 6.946e-01 4.172 0.000153 ***
## govrd 7.466e-01 3.812e-01 1.959 0.056997 .
## pubtechcomp 4.324e-01 1.154e-01 3.746 0.000554 ***
## gsp 8.706e-05 6.209e-05 1.402 0.168391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.797 on 41 degrees of freedom
## Multiple R-squared: 0.8612, Adjusted R-squared: 0.8408
## F-statistic: 42.39 on 6 and 41 DF, p-value: 4.802e-16
plot(model.final)


Additionally, I included the residual graphs of the final model. The graphs have
helped me with outlier detection and testing for non-constant variance. For outlier
detection, the leverage graphs were especially helpful.


The model is quite accurate in predicting the scores of the states, especially if you
plug in the real values of each state into the model. However, the model has limits: it
does not reveal any obvious trends or relationships between the variables.
Furthermore, the model has only informed me that utility patents are statistically
significant and not much more. Therefore, with the combination of the model and
some data visualization, I came up with few plots that help explain some of the
trends in the dataset.
Below is a graph of the state scores (y-axis) and utiliy patents (x-axis). The graph
studies the direct trend between score and utility patents, sceiencprof, and stem.
The legend on the side of the graph are distinguished by size and color. The darker
blue colors represent states with lower total stem degree holders from the total
population, and the smaller circles represent the states with less science
professionals (and vice versa). In fact, the large circle with the arrow pointing at it is
Massachusetts.

There seems to be a trend though nonetheless. The states get larger and lighter as
they migrate positively on the y and x-axiss. For this particular graph, utility patents
seem to be positively correlated with STEM degree holders and science
professionals.
ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = s
tem, size = scienceprof)) + geom_point()


In the graph, notice how states with lower utility patents tend to have lower stem
numbers and science professionals.
The graph below adds a regression line through the data points, where states above
the line have above average innovation scores.

ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = s


tem, size = scienceprof)) + geom_point() + stat_smooth()


ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = g
ovrd, size = gsp)) + geom_point()


I also wanted to compare utility patents with financial data such as government
spending on research and development and gross state product per employed
person. The most important feature of this graph is the startling amount of dark
colored states. Even the states that have high innovation scores are dark.
It seems that either government spending on research and development does not
strongly relate to innovation score, or its effects are felt in other areas that not
included in the model. In addition, there seems to be scaling issues involved with the
govrd data, however, even with a log transformation the trend is unclear. In spite of
all the discussion on govrd, government spending on R&D is a variable worth
studying in the future because it is statistically significant in the model and relates
to the state officials of Massachusetts most directly.


ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = g
ovrd, size = gsp)) + geom_point() + stat_smooth()

Here is another graph that studies the relationship between government spending
on R&D, utility patents, and public technology companies. The importance of this
graph is similar to the last one. The emphasis of the graph is the amount of small
states or the amount of states with low government spending on R&D. Additionally,
after a log transformation on govrd the size trend is still unclear.
ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = p
ubtechcomp, size = govrd)) + geom_point()


After being relatively convinced that utility patents are important measurements of
innovation, I looked at the ratio between the cost of stock for intellectual property
and investment from our given dataset. The ratios with the highest numbers
(intellectual property divided by investment) were in software, electronics,
chemical products, and motions pictures and sound recording.

2011
Computer and electronic products

97.5

2012
82.6

2013

2014

80.51851852 84.40740741

Chemical products

113.9230769 153.5806452 130.6666667 136.7692308

Publishing industries (includes software)

100.6363636 82.14285714 106.8181818

99.5

Motion pictures and sound recording


industries

318.5714286

486

565.25

469.4


These ratio numbers show that computer and electronics, chemical products,
publishing industries, and motion pictures all have cost of IP stock per investment
that are higher than many other industries. These industries are likely have the
most infrastructure and support. They are likely to be the best future investment for
governments and the states such as Massachusetts.


Conclusion:
My analysis focuses primarily on the development stage of the linear model of
innovation and utility patents. The development stage is an important aspect of the
linear model of innovation because it has historical consistency and technological
relevance. In addition, the development stage has properties, which consists of
improving and inventing useful materials, devices, products, systems, and processes,
that describe innovation. Through my analysis, I believe that a decent measure of
innovation in development is utility patents. Utility patent statistics are unique
because they measure how many new inventions are being patented. After using
several different statistical techniques, I found several interesting findings about
utility patents. Utility patents are positively correlated to science related education
and work.
I also discovered that financial data, especially government spending on research
and development, are difficult to analyze. It is not clear if there is a trend in
government spending and utility patents. In fact, government spending on research
and development is difficult to study because of its general complicated nature.
There are likely many things that are factored into a states research and
development budget.
I would recommend Massachusetts, and any other state, to intensely evaluate their
government spending on research and development. Its unclear how much a state
should invest in research and development to be efficient or optimally innovative.
Additionally, I recommend that Massachusetts and other states be conservative with
their research and development budget. Secondly, Massachusetts should evaluate
the following industries: computer and electronics, chemical products, publishing
industries, and motion pictures because of their relatively high IP per investment
ratio. Lastly, I recommend that Massachusetts invest in institutionalized innovation
where programs that promote the sciences can help improve utility patent
acceptance rates.

Вам также может понравиться