Вы находитесь на странице: 1из 7

Illustrating the Bias-Variance Tradeoff

Notes on statistical learning theory

Kevin Song

Department of Biomedical Informatics


Stanford University School of Medicine

July 14, 2017

Adapted from Prof. Jerome Friedman


Bias-Variance Tradeoff
Overview

I One of the most important concepts in statistical machine


learning.
I A good understanding of bias-variance tradeoff can improve
model selection and parametrization in future data mining /
prediction applications.

Adapted from Prof. Jerome Friedman


Calculating Model Accuracy
Using mean squared error

I Mean squared error is a metric of a regression models


inaccuracy.
N
MSE = N1 (ybi yi )2
P
I
i=1
N
1
(ybi yi )2
P
I MSE = N RSS, where RSS =
i=1

Adapted from Prof. Jerome Friedman


Residual sum of squares
Visualizing RSS

Figure: RSS measures the total deviance of a dataset from its


corresponding regression models estimates.
RSS = SSR = SSE = N MSE .
Adapted from Prof. Jerome Friedman
Bias-Variance Decomposition of MSE

I MSE can be decomposed into various error terms, known as


bias and variance.
I For proof, see
en.wikipedia.org/wiki/Mean_squared_error.
I MSE = Variance + Bias 2 + irreducible error
I What exactly do bias and variance represent?

Adapted from Prof. Jerome Friedman


Illustrating the Bias-Variance Tradeoff

Figure: As function space enlarges, bias , variance . As function space


shrinks, bias , variance .

Adapted from Prof. Jerome Friedman


Conclusions and Takeaways

I Using the principles of bias-variance tradeoff, we can infer


that boosted trees are high variance / low bias, and that
random forests are low variance / high bias.
I Extreme examples: ordinary least squares is generally high
bias / low variance, and deep neural networks are generally
low bias / high variance. (For neural networks, high variance
can be somewhat alleviated by using a extremely large
training dataset).
I Knowledge of bias-variance tradeoffs and function space size /
model complexity should influence our model selection and
parametrization procedures (e.g., whether to use one
algorithm over another and whether to use regularization).

Adapted from Prof. Jerome Friedman

Вам также может понравиться