Академический Документы
Профессиональный Документы
Культура Документы
Data
Analytics
Kevin
Choi
November
20,
2014
Abstract:
The
data
competition
asked
two
things
from
its
participants:
choose
a
particular
stage
of
the
linear
model
of
innovation1,
and
advise
the
2Massachusetts
government
on
how
to
foster
innovation.
The
purpose
of
my
analysis
is
two-fold.
It
tends,
firstly,
to
briefly
discuss
why
Ive
chosen
the
development
stage
of
the
innovation
model,
and
secondly,
to
discuss
my
recommendations
to
the
Massachusetts
government
in
areas
of
utility
patents
that
all
aim
to
foster
innovation.
Introduction:
My
reasons
for
choosing
the
development
stage
of
the
linear
model
of
innovation
are
mainly
because
of
its
historical
consistency
and
technological
relevance.
By
and
large,
the
development
stage
in
the
linear
model
of
innovation
has
shown
consistent
support
from
academics,
researchers,
and
economists.
Below
is
a
brief
table
that
illustrates
the
progress
in
the
linear
model
of
innovation.
Authors
J. Huxley (1934)
J.D. Bernal (1939)
V. Bush (1945)
Model
background, basic, adhoc, development
pure (and fundamental), applied
basic, applied
Your job is to advise the state government of Massachusetts by describing and explaining how
well the linear model of innovation can be used as a guide for policy aimed at fostering
innovation. - competition problem statement
3
R. N. Anthony
OECD (1962)
Note
the
number
of
times
development
is
included
in
these
early
models.
In
fact,
when
economists
began
to
play
a
larger
role
in
the
linear
model
of
innovation
they
consistently
kept
the
standard
model
intact
(basic,
applied,
and
development)
as
a
foundation
for
future
economic
theories
and
models.
Economists
had
agreed6
on
the
standard
three
categories
to
analyze
industrial
research,
and
have
kept
development
as
an
important
part
of
their
model.
Below
is
another
table
that
illustrates
how
economists
have
improved
and
re-defined
the
linear
model
of
innovation,
and
kept
development
as
an
important
part
of
their
model.
Authors
Model
Mees (1920)
Schumpeter (1939)
Stevens (1941)
Bichowsky (1942)
Furnas (1948)
Mees and
Leermakers (1950)
Brozen (1951a)
Brozen (1951b)
Maclaurin (1953)
Ruttan (1959)
Ames (1961)
Scherer (1965)
Schmookler (1966)
Mansfield (1968)
Utterback (1974)
Analysis:
My
analysis
begins
with
exterior
data
from
the
7Bloomberg
Innovation
Ranking
System.
The
Bloomberg
system
annually
ranks
the
top
twenty
states
in
the
United
States
for
innovation
with
a
score.
The
data
was
collected
from
the
Bureau
of
Economic
Analysis,
Bureau
of
Labor
Statistics,
National
Science
Foundation,
U.S.
7
http://www.bloomberg.com/visual-data/best-and-worst/most-innovative-in-u-dot-s-states
Census,
and
the
U.S.
Patent
and
Trademark
Office.
Below
is
a
brief
innovation
profile
of
Massachusetts
according
to
Bloombergs
ranking
system.
Massachusetts
(ranked third in the country)
STEM professionals as a percentage of state population:
3.44%
Science and tech degree holders as a percentage of state
population:
11.84%
Utility patents granted as a percentage of U.S. total:
4.74%
State government R&D spending as a percentage of U.S. total:
0.35%
Gross state product per employed person:
$110,325
Three-year change in productivity:
3.39%
Public tech companies as a percentage of all public companies
based in the state:
29.19%
Seeing
that
utility
patents
are
seemingly
important
to
innovation,
I
decided
to
build
a
linear
model
using
data
collected
from
Bloomberg.
Below
is
a
table
of
the
list
of
variables
that
I
used
to
study
to
innovation
and
utility
patents.
Name of variable
STEM professionals as a percentage of
state population
Science and tech degree holders as a
percentage of state population
utilitypatents
govrd
scienceprof
gsp
threeyearproduct
pubtechcomp
I
included
the
initial
model
below
without
any
transformations
or
variable
selection.
The
initial
models
summary
statistics
are
shown
below,
and
the
model
has
a
high
adjusted
R
square
value.
However,
there
is
reason
to
believe
that
the
variance
of
the
model
is
non-constant,
and
there
are
signs
of
heteroskedasticity.
I
conducted
the
Breusch-Pagan
(or
Cook-Weisberg)
test
to
check
for
non-constant
variance.
The
initial
model
fails
the
test,
and
the
model
has
non-constant
variance.
summary(model
<-
lm(score
~
stem
+
scienceprof
+
utilitypatents
+
govrd
+
threeyearproduct
+
pubtechcomp
+
gsp))
##
##
Call:
##
lm(formula
=
score
~
stem
+
scienceprof
+
utilitypatents
+
govrd
+
##
threeyearproduct
+
pubtechcomp
+
gsp)
##
##
##
Coefficients:
The
initial
model
needed
to
be
altered
in
some
way
to
make
OLS
and
regression
assumptions.
Therefore,
I
chose
to
transform
the
response
variable
(score)
using
a
combination
of
Box
Cox
transformation
and
maximum
likelihood
estimation.
lmbd
<-
boxcox(model,
data
=
bloombergdata,
lambda
=
seq(-2,
2),
main
=
"Transform
Score",
xlab
=
"lambda",
ylab
=
"log-likelihood")
The
new
transformed
model
is
shown
below.
It
seems
to
pass
the
non-constant
variance
test
with
a
p-value
equivalent
to
0.61398.
Moreover,
the
p-value
means
it
is
highly
likely
that
it
will
fail
to
reject
the
hypothesis
of
constant
variance.
ncvTest(model.transformed)
##
Non-constant
Variance
Score
Test
##
Variance
formula:
~
fitted.values
##
Chisquare
=
0.2544199
Df
=
1
p
=
0.61398
I
wanted
to
also
check
which
variables
were
statistically
insignificant
using
formal
variable
selection
methods.
The
variable
selection
I
used
below
is
the
Akaike
Information
Criteria
(AIC)
and
backwards
stepwise.
null
<-
lm(trans.score
~
1)
step.backward
<-
step(model.transformed,
scope
=
list(lower
=
model.tra
nsformed,
upper
=
null),
direction
=
"backward")
##
Start:
AIC=175.14
##
trans.score
~
stem
+
scienceprof
+
utilitypatents
+
govrd
+
pubtechc
omp
+
gsp
model.final
<-
step.backward
The
final
model
is
shown
below
after
the
Box-Cox
transformation
on
the
response
variable
score,
variable
selection
using
backward
stepwise,
and
testing
for
heteroskedasticity
using
the
Cook-Weisberg
test.
Also,
notice
that
in
the
final
model
the
variable
threeyearproduct
was
dropped.
Threeyearproduct
had
been
weakest
of
the
variables.
More
importantly,
utility
patents
are
statistically
significant
in
the
model.
print(summary(model.final))
##
##
Call:
##
lm(formula
=
trans.score
~
stem
+
scienceprof
+
utilitypatents
+
##
govrd
+
pubtechcomp
+
gsp)
##
##
Coefficients:
##
Estimate
Std.
Error
t
value
Pr(>|t|)
##
(Intercept)
-6.626e+00
5.459e+00
-1.214
0.231795
##
stem
4.687e+00
1.816e+00
2.581
0.013527
*
##
scienceprof
2.661e+00
6.340e-01
4.197
0.000141
***
##
utilitypatents
2.898e+00
6.946e-01
4.172
0.000153
***
##
govrd
7.466e-01
3.812e-01
1.959
0.056997
.
##
pubtechcomp
4.324e-01
1.154e-01
3.746
0.000554
***
##
gsp
8.706e-05
6.209e-05
1.402
0.168391
##
---
##
Signif.
codes:
0
'***'
0.001
'**'
0.01
'*'
0.05
'.'
0.1
'
'
1
##
##
Residual
standard
error:
5.797
on
41
degrees
of
freedom
##
Multiple
R-squared:
0.8612,
Adjusted
R-squared:
0.8408
##
F-statistic:
42.39
on
6
and
41
DF,
p-value:
4.802e-16
plot(model.final)
Additionally,
I
included
the
residual
graphs
of
the
final
model.
The
graphs
have
helped
me
with
outlier
detection
and
testing
for
non-constant
variance.
For
outlier
detection,
the
leverage
graphs
were
especially
helpful.
The
model
is
quite
accurate
in
predicting
the
scores
of
the
states,
especially
if
you
plug
in
the
real
values
of
each
state
into
the
model.
However,
the
model
has
limits:
it
does
not
reveal
any
obvious
trends
or
relationships
between
the
variables.
Furthermore,
the
model
has
only
informed
me
that
utility
patents
are
statistically
significant
and
not
much
more.
Therefore,
with
the
combination
of
the
model
and
some
data
visualization,
I
came
up
with
few
plots
that
help
explain
some
of
the
trends
in
the
dataset.
Below
is
a
graph
of
the
state
scores
(y-axis)
and
utiliy
patents
(x-axis).
The
graph
studies
the
direct
trend
between
score
and
utility
patents,
sceiencprof,
and
stem.
The
legend
on
the
side
of
the
graph
are
distinguished
by
size
and
color.
The
darker
blue
colors
represent
states
with
lower
total
stem
degree
holders
from
the
total
population,
and
the
smaller
circles
represent
the
states
with
less
science
professionals
(and
vice
versa).
In
fact,
the
large
circle
with
the
arrow
pointing
at
it
is
Massachusetts.
There
seems
to
be
a
trend
though
nonetheless.
The
states
get
larger
and
lighter
as
they
migrate
positively
on
the
y
and
x-axiss.
For
this
particular
graph,
utility
patents
seem
to
be
positively
correlated
with
STEM
degree
holders
and
science
professionals.
ggplot(data
=
bloombergdata,
aes(y=score,
x
=
utilitypatents,
color
=
s
tem,
size
=
scienceprof))
+
geom_point()
In
the
graph,
notice
how
states
with
lower
utility
patents
tend
to
have
lower
stem
numbers
and
science
professionals.
The
graph
below
adds
a
regression
line
through
the
data
points,
where
states
above
the
line
have
above
average
innovation
scores.
ggplot(data
=
bloombergdata,
aes(y=score,
x
=
utilitypatents,
color
=
g
ovrd,
size
=
gsp))
+
geom_point()
I
also
wanted
to
compare
utility
patents
with
financial
data
such
as
government
spending
on
research
and
development
and
gross
state
product
per
employed
person.
The
most
important
feature
of
this
graph
is
the
startling
amount
of
dark
colored
states.
Even
the
states
that
have
high
innovation
scores
are
dark.
It
seems
that
either
government
spending
on
research
and
development
does
not
strongly
relate
to
innovation
score,
or
its
effects
are
felt
in
other
areas
that
not
included
in
the
model.
In
addition,
there
seems
to
be
scaling
issues
involved
with
the
govrd
data,
however,
even
with
a
log
transformation
the
trend
is
unclear.
In
spite
of
all
the
discussion
on
govrd,
government
spending
on
R&D
is
a
variable
worth
studying
in
the
future
because
it
is
statistically
significant
in
the
model
and
relates
to
the
state
officials
of
Massachusetts
most
directly.
ggplot(data
=
bloombergdata,
aes(y=score,
x
=
utilitypatents,
color
=
g
ovrd,
size
=
gsp))
+
geom_point()
+
stat_smooth()
Here
is
another
graph
that
studies
the
relationship
between
government
spending
on
R&D,
utility
patents,
and
public
technology
companies.
The
importance
of
this
graph
is
similar
to
the
last
one.
The
emphasis
of
the
graph
is
the
amount
of
small
states
or
the
amount
of
states
with
low
government
spending
on
R&D.
Additionally,
after
a
log
transformation
on
govrd
the
size
trend
is
still
unclear.
ggplot(data
=
bloombergdata,
aes(y=score,
x
=
utilitypatents,
color
=
p
ubtechcomp,
size
=
govrd))
+
geom_point()
After
being
relatively
convinced
that
utility
patents
are
important
measurements
of
innovation,
I
looked
at
the
ratio
between
the
cost
of
stock
for
intellectual
property
and
investment
from
our
given
dataset.
The
ratios
with
the
highest
numbers
(intellectual
property
divided
by
investment)
were
in
software,
electronics,
chemical
products,
and
motions
pictures
and
sound
recording.
2011
Computer and electronic products
97.5
2012
82.6
2013
2014
80.51851852 84.40740741
Chemical products
99.5
318.5714286
486
565.25
469.4
These
ratio
numbers
show
that
computer
and
electronics,
chemical
products,
publishing
industries,
and
motion
pictures
all
have
cost
of
IP
stock
per
investment
that
are
higher
than
many
other
industries.
These
industries
are
likely
have
the
most
infrastructure
and
support.
They
are
likely
to
be
the
best
future
investment
for
governments
and
the
states
such
as
Massachusetts.
Conclusion:
My
analysis
focuses
primarily
on
the
development
stage
of
the
linear
model
of
innovation
and
utility
patents.
The
development
stage
is
an
important
aspect
of
the
linear
model
of
innovation
because
it
has
historical
consistency
and
technological
relevance.
In
addition,
the
development
stage
has
properties,
which
consists
of
improving
and
inventing
useful
materials,
devices,
products,
systems,
and
processes,
that
describe
innovation.
Through
my
analysis,
I
believe
that
a
decent
measure
of
innovation
in
development
is
utility
patents.
Utility
patent
statistics
are
unique
because
they
measure
how
many
new
inventions
are
being
patented.
After
using
several
different
statistical
techniques,
I
found
several
interesting
findings
about
utility
patents.
Utility
patents
are
positively
correlated
to
science
related
education
and
work.
I
also
discovered
that
financial
data,
especially
government
spending
on
research
and
development,
are
difficult
to
analyze.
It
is
not
clear
if
there
is
a
trend
in
government
spending
and
utility
patents.
In
fact,
government
spending
on
research
and
development
is
difficult
to
study
because
of
its
general
complicated
nature.
There
are
likely
many
things
that
are
factored
into
a
states
research
and
development
budget.
I
would
recommend
Massachusetts,
and
any
other
state,
to
intensely
evaluate
their
government
spending
on
research
and
development.
Its
unclear
how
much
a
state
should
invest
in
research
and
development
to
be
efficient
or
optimally
innovative.
Additionally,
I
recommend
that
Massachusetts
and
other
states
be
conservative
with
their
research
and
development
budget.
Secondly,
Massachusetts
should
evaluate
the
following
industries:
computer
and
electronics,
chemical
products,
publishing
industries,
and
motion
pictures
because
of
their
relatively
high
IP
per
investment
ratio.
Lastly,
I
recommend
that
Massachusetts
invest
in
institutionalized
innovation
where
programs
that
promote
the
sciences
can
help
improve
utility
patent
acceptance
rates.