Bus C Analysis

BUSC
Data Analytics
Kevin Choi
November 20, 2014

Abstract:
The data competition asked two things from its participants: choose a particular
stage of the linear model of innovation1, and advise the 2Massachusetts government
on how to foster innovation. The purpose of my analysis is two-fold. It tends, firstly,
to briefly discuss why Ive chosen the development stage of the innovation model,
and secondly, to discuss my recommendations to the Massachusetts government in
areas of utility patents that all aim to foster innovation.

Introduction:
My reasons for choosing the development stage of the linear model of innovation
are mainly because of its historical consistency and technological relevance. By and
large, the development stage in the linear model of innovation has shown consistent
support from academics, researchers, and economists. Below is a brief table that
illustrates the progress in the linear model of innovation.
Authors
J. Huxley (1934)
J.D. Bernal (1939)
V. Bush (1945)
Model
background, basic, adhoc, development
pure (and fundamental), applied
basic, applied
Bowman (in Bush, 1945)
pure, background, applied and development
U.S. PSRB3 (1947)
fundamental, background, applied,

development
The Linear Model of Innovation: The Historical Construction of an Analytical Framework by
Your job is to advise the state government of Massachusetts by describing and explaining how
well the linear model of innovation can be used as a guide for policy aimed at fostering
innovation. - competition problem statement
3
Presidents Scientific Research Board
Canadian DRS4 (1947)
fundamental, background, applied,

development
R. N. Anthony
uncommitted, applied, development
U.S. NSF (1953)
basic, applied, development
British DSIR5 (1958)
basic, applied and development,

prototype
OECD (1962)
fundamental, applied, development
Note the number of times development is included in these early models. In fact,
when economists began to play a larger role in the linear model of innovation they
consistently kept the standard model intact (basic, applied, and development) as a
foundation for future economic theories and models. Economists had agreed6 on the
standard three categories to analyze industrial research, and have kept
development as an important part of their model. Below is another table that
illustrates how economists have improved and re-defined the linear model of
innovation, and kept development as an important part of their model.
Authors
Model
Mees (1920)
Pure science, development, manufacturing
Schumpeter (1939)
Invention, innovation, imitation
Stevens (1941)
Fundamental research, applied research, test-tube or bench

research, pilot plant, production (improvement, trouble
Bichowsky (1942)
Research, engineering (or development), factory (or production)
Furnas (1948)
Exploratory and fundamental research, applied research,

development, production
Mees and
Leermakers (1950)
Research, development (establishment of small-scale use, pilot

plant and models, adoption in manufacturing)
Department of Reconstruction and Supply

Department of Scientific and Industrial Research
6
They finally settled on the conventional taxonomy, using the standard three categories to
analyze industrial research and using numbers on R&D for measuring the contribution of science to economic
progress (Godin 2004). - Godin, Linear Model of Innovation
5
Brozen (1951a)
Invention, innovation, imitation
Brozen (1951b)
Research, engineering development, production, service
Maclaurin (1953)
Pure science, invention, innovation, finance, acceptance
Ruttan (1959)
Invention, innovation, technological change
Ames (1961)
Research, invention, development, innovation
Scherer (1965)
Invention, entrepreneurship, investment, development
Schmookler (1966)
Research, development, invention
Mansfield (1968)
Invention, diffusion, innovation
Myers and Marquis

(1969)
Problem solving, solution, utilization, diffusion
Utterback (1974)
Generation of an idea, problem-solving or development,

implementation, and diffusion
By the 1960s, the Organization Economic Co-operation and Development (OECD)

formally defined development as, The use of the results of fundamental and applied
research directed to the introduction of useful materials, devices, products, systems,
and processes or the improvement of existing ones. The mention of useful
materials, devices, products, systems, processes or the improvement of existing
ones has allowed me to concentrate my analysis on statistics that capture utility
and innovation. Therefore, my analysis focuses primarily on utility patents to
measure and evaluate innovation.
Analysis:
My analysis begins with exterior data from the 7Bloomberg Innovation Ranking
System. The Bloomberg system annually ranks the top twenty states in the United
States for innovation with a score. The data was collected from the Bureau of
Economic Analysis, Bureau of Labor Statistics, National Science Foundation, U.S.

7
http://www.bloomberg.com/visual-data/best-and-worst/most-innovative-in-u-dot-s-states
Census, and the U.S. Patent and Trademark Office. Below is a brief innovation profile
of Massachusetts according to Bloombergs ranking system.
Massachusetts
(ranked third in the country)
STEM professionals as a percentage of state population:
3.44%
Science and tech degree holders as a percentage of state
population:
11.84%
Utility patents granted as a percentage of U.S. total:
4.74%
State government R&D spending as a percentage of U.S. total:
0.35%
Gross state product per employed person:
$110,325
Three-year change in productivity:
3.39%
Public tech companies as a percentage of all public companies
based in the state:
29.19%
The Bloomberg dataset is particularly useful because it holds unique information

about utility patents. To measure and evaluate innovation, I wanted to focus on
utility patents, and understand their role in innovation. Firstly, utility patents seem
important to innovation because they are intellectual property. In fact, according to
the World Intellectual Property Organization, Intellectual property (IP) refers to
creations of the mind, such as inventions; literary and artistic works; designs; and
symbols, names and images used in commerce. IP is protected in law by, for
example, patents, copyright, and trademarks, which enable people to earn
recognition or financial benefit from what they invent or create. By striking the right
balance between interests of innovations and the wider public interest, the IP
system aims to foster an environment in which creativity and innovation can
flourish. The word innovation is mentioned in the definition several times, which
provides me enough confidence to pursue utility patents as an appropriate indicator

for innovation. Secondly, there is a startling amount of states with both high
innovation scores and utility patents on the Bloomberg innovation system. Below is
the utility patent distribution organized from the highest scores on the left.
Seeing that utility patents are seemingly important to innovation, I decided to build
a linear model using data collected from Bloomberg. Below is a table of the list of
variables that I used to study to innovation and utility patents.
Name of variable
STEM professionals as a percentage of
state population
Science and tech degree holders as a
percentage of state population
Model name for variables

stem
Utility patents granted as a percentage

of U.S. total
utilitypatents
State government R&D spending as a

percentage of U.S. total
govrd
scienceprof
Gross state product per employed

person
gsp
Three-year change in productivity
threeyearproduct
Public tech companies as a percentage

of all public companies based in the
state:
pubtechcomp
The Statistical Analysis of the Bloomberg Innovation

Ranking System
I began with exploring the Bloomberg dataset using scatterplots. These exploratory
data analysis techniques are useful for understanding the data, and the visual
graphs are helpful for identifying any apparent trends. Nothing unordinary stood
out in the scatterplot except several linear relationships between the variables and
the innovation score.
pairs(bloombergdata)

I included the initial model below without any transformations or variable selection.
The initial models summary statistics are shown below, and the model has a high
adjusted R square value. However, there is reason to believe that the variance of the
model is non-constant, and there are signs of heteroskedasticity. I conducted the
Breusch-Pagan (or Cook-Weisberg) test to check for non-constant variance. The
initial model fails the test, and the model has non-constant variance.

summary(model <- lm(score ~ stem + scienceprof + utilitypatents + govrd
+ threeyearproduct + pubtechcomp + gsp))
##
## Call:
## lm(formula = score ~ stem + scienceprof + utilitypatents + govrd +
## threeyearproduct + pubtechcomp + gsp)
##
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -1.123e+01 5.500e+00 -2.042 0.047824 *
## stem 5.704e+00 1.824e+00 3.128 0.003278 **
## scienceprof 3.146e+00 6.371e-01 4.938 1.44e-05 ***
## utilitypatents 2.677e+00 7.250e-01 3.692 0.000664 ***
## govrd 1.082e+00 3.852e-01 2.809 0.007652 **
## threeyearproduct 1.399e+00 3.146e-01 4.447 6.77e-05 ***
## pubtechcomp 5.664e-01 1.163e-01 4.870 1.79e-05 ***
## gsp 1.139e-04 6.236e-05 1.826 0.075313 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.821 on 40 degrees of freedom
## Multiple R-squared: 0.9077, Adjusted R-squared: 0.8916
## F-statistic: 56.21 on 7 and 40 DF, p-value: < 2.2e-16
ncvTest(model)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 4.806927 Df = 1 p = 0.02834556

The initial model needed to be altered in some way to make OLS and regression
assumptions. Therefore, I chose to transform the response variable (score) using a
combination of Box Cox transformation and maximum likelihood estimation.

lmbd <- boxcox(model, data = bloombergdata, lambda = seq(-2, 2), main =
"Transform Score", xlab = "lambda", ylab = "log-likelihood")
lambda <- lmbd$x[which.max(lmbd$y)]

trans.score <- (score^lambda - 1) / lambda

The new transformed model is shown below. It seems to pass the non-constant
variance test with a p-value equivalent to 0.61398. Moreover, the p-value means it is
highly likely that it will fail to reject the hypothesis of constant variance.

ncvTest(model.transformed)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.2544199 Df = 1 p = 0.61398

I wanted to also check which variables were statistically insignificant using formal
variable selection methods. The variable selection I used below is the Akaike
Information Criteria (AIC) and backwards stepwise.

null <- lm(trans.score ~ 1)
step.backward <- step(model.transformed, scope = list(lower = model.tra
nsformed, upper = null), direction = "backward")
## Start: AIC=175.14
## trans.score ~ stem + scienceprof + utilitypatents + govrd + pubtechc
omp + gsp
model.final <- step.backward

The final model is shown below after the Box-Cox transformation on the response
variable score, variable selection using backward stepwise, and testing for
heteroskedasticity using the Cook-Weisberg test. Also, notice that in the final model
the variable threeyearproduct was dropped. Threeyearproduct had been weakest of
the variables. More importantly, utility patents are statistically significant in the
model.

print(summary(model.final))
##
## Call:
## lm(formula = trans.score ~ stem + scienceprof + utilitypatents +
## govrd + pubtechcomp + gsp)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.626e+00 5.459e+00 -1.214 0.231795
## stem 4.687e+00 1.816e+00 2.581 0.013527 *
## scienceprof 2.661e+00 6.340e-01 4.197 0.000141 ***
## utilitypatents 2.898e+00 6.946e-01 4.172 0.000153 ***
## govrd 7.466e-01 3.812e-01 1.959 0.056997 .
## pubtechcomp 4.324e-01 1.154e-01 3.746 0.000554 ***
## gsp 8.706e-05 6.209e-05 1.402 0.168391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.797 on 41 degrees of freedom
## Multiple R-squared: 0.8612, Adjusted R-squared: 0.8408
## F-statistic: 42.39 on 6 and 41 DF, p-value: 4.802e-16
plot(model.final)

Additionally, I included the residual graphs of the final model. The graphs have
helped me with outlier detection and testing for non-constant variance. For outlier
detection, the leverage graphs were especially helpful.

The model is quite accurate in predicting the scores of the states, especially if you
plug in the real values of each state into the model. However, the model has limits: it
does not reveal any obvious trends or relationships between the variables.
Furthermore, the model has only informed me that utility patents are statistically
significant and not much more. Therefore, with the combination of the model and
some data visualization, I came up with few plots that help explain some of the
trends in the dataset.
Below is a graph of the state scores (y-axis) and utiliy patents (x-axis). The graph
studies the direct trend between score and utility patents, sceiencprof, and stem.
The legend on the side of the graph are distinguished by size and color. The darker
blue colors represent states with lower total stem degree holders from the total
population, and the smaller circles represent the states with less science
professionals (and vice versa). In fact, the large circle with the arrow pointing at it is
Massachusetts.
There seems to be a trend though nonetheless. The states get larger and lighter as
they migrate positively on the y and x-axiss. For this particular graph, utility patents
seem to be positively correlated with STEM degree holders and science
professionals.
ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = s
tem, size = scienceprof)) + geom_point()

In the graph, notice how states with lower utility patents tend to have lower stem
numbers and science professionals.
The graph below adds a regression line through the data points, where states above
the line have above average innovation scores.
ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = s

tem, size = scienceprof)) + geom_point() + stat_smooth()

ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = g
ovrd, size = gsp)) + geom_point()

I also wanted to compare utility patents with financial data such as government
spending on research and development and gross state product per employed
person. The most important feature of this graph is the startling amount of dark
colored states. Even the states that have high innovation scores are dark.
It seems that either government spending on research and development does not
strongly relate to innovation score, or its effects are felt in other areas that not
included in the model. In addition, there seems to be scaling issues involved with the
govrd data, however, even with a log transformation the trend is unclear. In spite of
all the discussion on govrd, government spending on R&D is a variable worth
studying in the future because it is statistically significant in the model and relates
to the state officials of Massachusetts most directly.

ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = g
ovrd, size = gsp)) + geom_point() + stat_smooth()
Here is another graph that studies the relationship between government spending
on R&D, utility patents, and public technology companies. The importance of this
graph is similar to the last one. The emphasis of the graph is the amount of small
states or the amount of states with low government spending on R&D. Additionally,
after a log transformation on govrd the size trend is still unclear.
ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = p
ubtechcomp, size = govrd)) + geom_point()

After being relatively convinced that utility patents are important measurements of
innovation, I looked at the ratio between the cost of stock for intellectual property
and investment from our given dataset. The ratios with the highest numbers
(intellectual property divided by investment) were in software, electronics,
chemical products, and motions pictures and sound recording.
2011
Computer and electronic products
97.5
2012
82.6
2013
2014
80.51851852 84.40740741
Chemical products
113.9230769 153.5806452 130.6666667 136.7692308
Publishing industries (includes software)
100.6363636 82.14285714 106.8181818
99.5
Motion pictures and sound recording

industries
318.5714286
486
565.25
469.4

These ratio numbers show that computer and electronics, chemical products,
publishing industries, and motion pictures all have cost of IP stock per investment
that are higher than many other industries. These industries are likely have the
most infrastructure and support. They are likely to be the best future investment for
governments and the states such as Massachusetts.

Conclusion:
My analysis focuses primarily on the development stage of the linear model of
innovation and utility patents. The development stage is an important aspect of the
linear model of innovation because it has historical consistency and technological
relevance. In addition, the development stage has properties, which consists of
improving and inventing useful materials, devices, products, systems, and processes,
that describe innovation. Through my analysis, I believe that a decent measure of
innovation in development is utility patents. Utility patent statistics are unique
because they measure how many new inventions are being patented. After using
several different statistical techniques, I found several interesting findings about
utility patents. Utility patents are positively correlated to science related education
and work.
I also discovered that financial data, especially government spending on research
and development, are difficult to analyze. It is not clear if there is a trend in
government spending and utility patents. In fact, government spending on research
and development is difficult to study because of its general complicated nature.
There are likely many things that are factored into a states research and
development budget.
I would recommend Massachusetts, and any other state, to intensely evaluate their
government spending on research and development. Its unclear how much a state
should invest in research and development to be efficient or optimally innovative.
Additionally, I recommend that Massachusetts and other states be conservative with
their research and development budget. Secondly, Massachusetts should evaluate
the following industries: computer and electronics, chemical products, publishing
industries, and motion pictures because of their relatively high IP per investment
ratio. Lastly, I recommend that Massachusetts invest in institutionalized innovation
where programs that promote the sciences can help improve utility patent
acceptance rates.

Bus C Analysis

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Bus C Analysis

Загружено:

Авторское право:

Доступные форматы

BUSC

Bowman (in Bush, 1945)

pure, background, applied and development

U.S. PSRB3 (1947)

fundamental, background, applied,

The Linear Model of Innovation: The Historical Construction of an Analytical Framework by

Presidents Scientific Research Board

Canadian DRS4 (1947)

fundamental, background, applied,

uncommitted, applied, development

U.S. NSF (1953)

basic, applied, development

British DSIR5 (1958)

basic, applied and development,

fundamental, applied, development

Pure science, development, manufacturing

Invention, innovation, imitation

Fundamental research, applied research, test-tube or bench

Research, engineering (or development), factory (or production)

Exploratory and fundamental research, applied research,

Research, development (establishment of small-scale use, pilot

Department of Reconstruction and Supply

Invention, innovation, imitation

Research, engineering development, production, service

Pure science, invention, innovation, finance, acceptance

Invention, innovation, technological change

Research, invention, development, innovation

Invention, entrepreneurship, investment, development

Research, development, invention

Invention, diffusion, innovation

Myers and Marquis

Problem solving, solution, utilization, diffusion

Generation of an idea, problem-solving or development,

By the 1960s, the Organization Economic Co-operation and Development (OECD)

The Bloomberg dataset is particularly useful because it holds unique information

provides me enough confidence to pursue utility patents as an appropriate indicator

Model name for variables

Utility patents granted as a percentage

State government R&D spending as a

Gross state product per employed

Three-year change in productivity

Public tech companies as a percentage

The Statistical Analysis of the Bloomberg Innovation

## Estimate Std. Error t value Pr(>|t|)

lambda <- lmbd$x[which.max(lmbd$y)]

ggplot(data = bloombergdata, aes(y=score, x = utilitypatents, color = s

113.9230769 153.5806452 130.6666667 136.7692308

Publishing industries (includes software)

100.6363636 82.14285714 106.8181818

Motion pictures and sound recording

Вам также может понравиться