I One of the most important concepts in statistical machine
learning. I A good understanding of bias-variance tradeoff can improve model selection and parametrization in future data mining / prediction applications.
Adapted from Prof. Jerome Friedman
Calculating Model Accuracy Using mean squared error
I Mean squared error is a metric of a regression models
inaccuracy. N MSE = N1 (ybi yi )2 P I i=1 N 1 (ybi yi )2 P I MSE = N RSS, where RSS = i=1
Adapted from Prof. Jerome Friedman
Residual sum of squares Visualizing RSS
Figure: RSS measures the total deviance of a dataset from its
corresponding regression models estimates. RSS = SSR = SSE = N MSE . Adapted from Prof. Jerome Friedman Bias-Variance Decomposition of MSE
I MSE can be decomposed into various error terms, known as
bias and variance. I For proof, see en.wikipedia.org/wiki/Mean_squared_error. I MSE = Variance + Bias 2 + irreducible error I What exactly do bias and variance represent?
Adapted from Prof. Jerome Friedman
Illustrating the Bias-Variance Tradeoff
Figure: As function space enlarges, bias , variance . As function space
shrinks, bias , variance .
Adapted from Prof. Jerome Friedman
Conclusions and Takeaways
I Using the principles of bias-variance tradeoff, we can infer
that boosted trees are high variance / low bias, and that random forests are low variance / high bias. I Extreme examples: ordinary least squares is generally high bias / low variance, and deep neural networks are generally low bias / high variance. (For neural networks, high variance can be somewhat alleviated by using a extremely large training dataset). I Knowledge of bias-variance tradeoffs and function space size / model complexity should influence our model selection and parametrization procedures (e.g., whether to use one algorithm over another and whether to use regularization).